Folk, R. A., H. R. Kates, R. LaFrance, D. E. Soltis, P. S. Soltis, and R. P. Guralnick. 2021. High-throughput methods for efficiently building massive phylogenies from natural history collections. Applications in Plant Sciences 9:e11410. [View on publisher’s site]



Large phylogenetic data sets have often been restricted to small numbers of loci from GenBank, and a vetted sampling-to-sequencing phylogenomic protocol scaling to thousands of species is not yet available. Here, we report a high-throughput collections-based approach that empowers researchers to explore more branches of the tree of life with numerous loci.


We developed an integrated Specimen-to-Laboratory Information Management System (SLIMS), connecting sampling and wet lab efforts with progress tracking at each stage. Using unique identifiers encoded in QR codes and a taxonomic database, a research team can sample herbarium specimens, efficiently record the sampling event, and capture specimen images. After sampling in herbaria, images are uploaded to a citizen science platform for metadata generation, and tissue samples are moved through a simple, high-throughput, plate-based herbarium DNA extraction and sequencing protocol.


We applied this sampling-to-sequencing workflow to ~15,000 species, producing for the first time a data set with ~50% taxonomic representation of the “nitrogen-fixing clade” of angiosperms.


The approach we present is appropriate at any taxonomic scale and is extensible to other collection types. The widespread use of large-scale sampling strategies repositions herbaria as accessible but largely untapped resources for broad taxonomic sampling with thousands of species.