A new University of Florida and Duke University collaboration aims to do for the tree of life what Google Earth did for navigation.

A National Science Foundation grant of nearly $1 million will fund a three-year project to develop software that will make the context of every named and unnamed organism accessible online to scientists and nonscientists.

Nico Cellinese

Nico Cellinese and collaborators at Duke University will develop software over the next three years to improve access to information on the tree of life. Photo courtesy of Reed Beaman

The new software will allow computers to translate the tree of life and put scientific names in context by more clearly linking those names to evolutionary concepts and associated data, including DNA sequences and morphological characteristics. The project will have immediate and broad practical applications for communicating, integrating and querying biological data across the tree of life, said Nico Cellinese, associate curator of the Herbarium and informatics at the Florida Museum on the UF campus.

“A new navigation system called ‘phyloreferencing’ will allow us to put some real coordinates on the tree of life, based on the actual evolutionary context of the specific branches and leaves that people query,” Cellinese said. “When you use Google Earth, you put in an address or location and it takes you exactly where you need to go. Right now we have the tree of life, but we cannot perform a name query and retrieve with confidence the groups of interest. Even more importantly, we cannot query the branches that have not yet been named, such as those that have only recently been discovered.”

For centuries scientists have added content or data to the tree of life using a traditional organism naming system developed by Carl Linnaeus more than 300 years ago. But now there are hundreds of thousands of branches and no way to easily locate specific information in the tree.

Cellinese has spent her career rummaging through jungles and museum collections, sequencing DNA and adding to our understanding of the history of life. But she became frustrated when she could not easily search the tree she is helping build.

“If you Google a scientific name now, you’ll get a lot of irrelevant and ambiguous information. You cannot easily find everything that is correctly linked to a name, such as the species associated with an organism’s group,” Cellinese said. “Our project will change this by researching, implementing and testing online specifications for computing with phyloreferences.”

Understanding and managing scientific data about the many different organisms discovered by scientists currently relies on their names. Although scientists are adding interesting observations for many groups, including molecular data, some may never have names, and exactly which organism a name references is often ambiguous.

The ability to easily find all aspects associated with named and unnamed organisms along the tree’s branches will be incredibly helpful when retrieving information about biodiversity, Cellinese said. She and collaborator Hilmar Lapp, director of informatics at Duke University’s Center of Genomic and Computational Biology, said the new system will make it possible for computers to do much of the leg work for scientists when it comes to navigating the branches of the tree of life.

“The Linnaean system can fail in several ways, all of which trace back to the fact that it is based on context-lacking names,” Lapp said. “The meaning of names is opaque to machines because they see a name only as a sequence of letters. The meaning of names changes over time as knowledge of the groups they designate changes, but a computer cannot understand such changes.”

To create the phyloreference technology, project leaders will use standards and tools developed for the Web. These reference points will be designed with the goal that any element on the tree of life, whether node, branch or clades, can be referenced in a way that is unambiguous and has a fully computable meaning defined by the way organisms relate to one another.

Eventually the new approach will be integrated into large-scale biodiversity resources such as the Open Tree of Life project, projects newly funded by the NSF Genealogy of Life program and other online databases built for the tree of life.

The project includes developing a new online course module to teach students how to create phyloreferences and utilize the new software, Cellinese said. She also plans to develop an exhibit for the Florida Museum in the third year of the project, where museum visitors will be able to search the tree of life on an interactive computer.

“Information about an organism that cannot be understood by a computer also cannot lead to discoveries as much as it could otherwise,” Lapp said. “The technology we develop by itself won’t yield new discoveries, but will enable them.”

Author

By Stephenie Livingston | More articles by Stephenie Livingston

• Learn more about the Herbarium at the Florida Museum.

• Visit the Cellinese Lab website.

Tags