BIG-TB: A benchmark for evaluating prediction and interpretability of sequence-based machine learning using *Mycobacterium tuberculosis* genomes
Journal:
bioRxiv
Published Date:
Feb 2, 2026
Abstract
Foundation models aim to learn useful representations of biological sequences. However, the applicability of these representations for a wide range of tasks, including phenotype prediction and variant discovery, is still in question, in large part due to the relatively small set of benchmark tasks. To this end, we present the Benchmarks for Interpretable prediction from Genomes of Tuberculosis, BIG-TB. We curate over 17,000 genomes with high-quality short read sequencing data and experimentally measured antibiotic resistance phenotypes, combined with a curated list of canonical resistance-conferring variants, and provide these data in an ML-ready format. BIG-TB defines two tasks for interrogating the utility of foundation models: (1) predictive performance of antibiotic resistance phenotypes, and (2) attribution of predictions to known resistance-conferring variants from an expert-curated dataset. Using our benchmark, we show that DNA-based foundation models do not yet outperform simple machine learning baselines (mean test AUC=0.888 vs. 0.846 for best CNN variant vs. best DNABERT variant across drugs). Our benchmark also supports protein-based models, where performance is worse than DNA-based models due to the loss of representation of non-coding variants, but where foundation models are more competitive with simple ML. We show that the models with highest predictive performance do not necessarily perform the best at canonical resistance variant discovery - indicating that in many cases the improved performance may be due to non-causal associations between variants and phenotype. Finally, we show that the choice of embedding representation has a major impact on foundation model performance, and that representations that average over sequence position perform poorly at both prediction and canonical resistance variant discovery. Overall, BIG-TB provides a new type of benchmark for foundation models of biological sequences, facilitating comparison of representations across multiple tasks. Code available at https://github.com/SAGE-Lab-UMass/Big-TB-benchmark