Quantifying the oxygen preferences of bacterial communities using a metagenome-based approach
Journal:
bioRxiv
Published Date:
Jan 23, 2026
Abstract
Oxygen is a primary driver of the distribution and activity of microbial life. Since oxygen levels are often difficult to measure in situ, one potential solution is to use bacteria as bioindicators of oxygen levels. As bacteria range from obligate aerobes to obligate anaerobes, quantification of bacterial community oxygen preferences could be used to infer variation in environmental oxygen levels and bacterial metabolic strategies. After using ensemble machine learning to select the 20 most important genes that predict oxygen tolerances in individual bacteria, we established a relationship between the abundance ratio of aerobic: anaerobic indicator genes and the proportional abundance of aerobic bacteria using simulated metagenomes with varying ratios of known aerobic and anaerobic bacteria. We developed a tool, OxyMetaG, that takes metagenomic reads as input, extracts bacterial reads, maps reads to the 20 genes, and predicts the proportion of aerobic versus anaerobic bacteria in any given sample. We tested OxyMetaG on a suite of metagenomes with measured or inferred oxygen levels across a variety of environmental and host-associated samples. To demonstrate the utility of our approach, we applied OxyMetaG to 540 surface soils, showing that surface soils are typically dominated by aerobes, but wetter sites with finer textures have relatively more anaerobes. Lastly, we applied OxyMetaG to 73 human gut samples, showing that in the first three years of life, human guts progress from having up to 61% aerobes to being completely dominated by anaerobes. We expect OxyMetaG to have broad utility for characterizing both modern and ancient environments.