Background

Microbes are microscopic organisms found in a variety of environments. Bacteria, viruses, and certain fungi all qualify, and they can live in soil, water, air, and other organisms, including the human body. Some can be harmful, but most play an important role in their surroundings. They provide humans with ecosystem services like nutrient cycling and ecosystem stability, and they are the fundamental foundation of biodiversity. Microbes can be described as the foundation of human life; without them, we would be unable to survive. 

Communities of microbes are called microbiomes. Just like rainforests and coral reefs, microbiomes are threatened by the biodiversity crisis. Modeling the microbiome helps quantify diversity, assess stability, and predict shifts. If microbiome interactions are better understood, then models could show ways to support biodiversity. 

Current Research 

At the University of Florida, Assistant Professor Juannan Zhou collaborates with six graduate students on broader-scale microbiome interactions. Integrating machine learning and multi-scaled experiments helps Dr. Zhou’s team learn about the protein fitness landscape and adaptive evolution of complex traits, which supports each graduate student’s individual research direction. 

Palash Sethi, a founding graduate student of the lab, focuses on developing models that map genotype-phenotype relationships across molecules, species, and interspecific interactions on the microbiome scale. As a University of Florida Biodiversity Institute (UFBI) fellow, he investigates microbial diversity and microbiome community dynamics by building large language models to capture species interactions. 

According to Palash, Dr. Zhou is a hands-on and visionary Principal Investigator (PI). Palash appreciates how Dr. Zhou allows him to be responsible for his own project without being micromanaged. Palash feels that his work falls only on him, but that he has a safety net to catch him. 

Palash acknowledges that models should capture relevant features to assist in understanding how biodiversity affects microbiome traits. Data on species structure, function, and abundance all influence how microbial communities form and function. 

The use of Bio-AI models, specifically BioLLMs, has advanced the game in microbiome modeling. These models create embeddings, or dense numerical representations, that capture essential biological properties in meaningful ways. While technologies that translate genetic sequences into phenotypes are still emerging, current BioLLMs do well at converting raw sequences into informative genetic features. Technology transforming genetic sequences into phenotypes is already in the works. However, fully understanding how microbial genotypes give rise to phenotypes remains an ongoing challenge, as phenotype-level traits in microbes are still not well mapped from sequence data alone. 

Palash notes that the research timeline agreed upon with the UFBI is still in progress. There is just one component left, which involves integrating a computational model called BacPT using a large language model based on architecture of bacterial proteins, their complex traits, and whether two species are likely to interact, compete, or cooperate. The goal is to scale these microbial dynamics models from small microbiomes to larger systems. 

Because Palash’s work centers on large language models, he looks for ways to incorporate new artificial intelligence techniques into his architectural designs. Mechanistic interpretability has become especially important to him; he believes people will trust his model only if they understand how it works. He rarely writes code from scratch now, relying instead on refining and adapting existing tools. A “deep research” feature helps him generate clear summaries during this process.

Why it Matters

Evolutionary Scale Modeling (ESM) has already been able to predict protein functions with surprising accuracy, from enzymatic activity to structural stability. Other BioLLMs also make a difference in identifying new genes, classifying protein families, and detecting antimicrobial resistance markers. Some BioLLMs have even been used to make new proteins in synthetic biological drug design for specific uses. 

As these artificial intelligence models continue to develop, more insight allows us to home in on biological sequence analysis. From characterizing microbes to predicting ecological interactions, we are becoming closer to understanding microbiome biodiversity. 

Over the past year, Palash published two papers: one on a simpler approach to bio ploidy and another on his current research. These explain genotype-to-phenotype mapping, exploring how protein sequences influence function. While simple additive models assume each amino acid acts independently, Palash also studies epistasis, the way interactions between amino acids and protein structure shape protein function. His key conclusion is that modeling epistasis is essential for accurately predicting protein function, as independent models lack sufficient accuracy. 

Palash also shares his work openly online, on platforms like LinkedInX, and Google Scholar. Because much of his work is open source, he intends to publish everything so that others can use and build upon his model. Check out his most recent manuscript and pre-print here. Be on the lookout for more information on his work, including a one-year model on community dynamics next year. 

Information from the Kempner Institute, LinkedIn, Google Scholar, X, BioRxiv, the UF Thompson Earth Systems Institute, and the UF Biodiversity Institute. Photos courtesy of Canva Pro.