Hannah Owens, Florida Museum courtesy scientist and former museum postdoctoral researcher, led a collaborative effort to develop occCite, a program that summarizes biological information from multiple online databases and generates publication-ready citations. The code, a significant timesaver for scientists, also allows museums and other institutes to track how their specimen data are being used, an increasing challenge in the era of massive open-access databases.
“Citing primary data providers is important, not just so that the research we do is reproducible, but also so primary providers like museums can keep track of how the data they provide are being used,” Owens said. “Museums can then use this information to demonstrate how relevant their collections are for ongoing research.”
Owens’s team shares second place in the competition – sponsored by GBIF, a database of more than 1 billion biological records – with two others and will receive about $5,600. The annual challenge is named for one of the GBIF’s founders, a Danish entomologist and pioneering data specialist, and seeks to reward innovative approaches to improving open-source data management.
Other occCite team members are Robert Guralnick, Florida Museum curator of informatics, museum postdoctoral researcher Vijay Barve, Cory Merow of the University of Connecticut and Brian Maitner of the University of Arizona.
The idea for occCite came to Owens after an arduous week-long process of creating tables and collecting citations for a paper on mapping butterfly diversity.
“I was using data from 37 papers, four community science websites, three natural history museums, four aggregator databases like GBIF, a colleague’s personal collection and Flickr,” she said.
To streamline this process, she designed occCite to pull information on when and where a particular organism was spotted, also known as species occurrence data, from a variety of online databases.
Biologists use this information to understand where a species has lived in the past, where it is found now and where it might head in the future. These data are also crucial for wildlife management officials and conservation agencies interested in tracking and managing invasive species or anticipating how endangered populations might be impacted by factors such as climate change.
Using occCite, a scientist investigating changes over time in the geographical distribution of the tegu, an invasive South American lizard, can download all known tegu records from hundreds of museums and community scientists.
“With those data come not just where they were found and when, but also tables of how many records came from each source and preformatted citations for those data with only one or two more lines of code,” Owens said.
Owens and her team are hopeful occCite will be a valuable tool as researchers sift through ever-growing quantities of online data and help preserve data integrity.
“We’ve worked really hard on this and want these tools to reach the scientific community,” Owens said. “I hope this is another step forward to clearer, more repeatable biodiversity science that provides proper credit to the hardworking museum professionals behind the mountains of data now available online.”
Funding for the project came from the University of Florida Biodiversity Institute, the UF Informatics Institute, the University of Copenhagen GLOBE Institute’s Center for Macroecology, Evolution and Climate and the National Science Foundation.