What Do Biological Foundation Models Compute? Sparse Autoencoders from Feature Recovery to Mechanistic Interpretability

Journal: bioRxiv
Published Date:

Abstract

Foundation models trained on protein and DNA sequences are increasingly deployed for variant interpretation, drug design, and gene regulation prediction, yet their internal representations remain opaque, limiting both biological insight and trust in model-guided decisions. Existing interpretation approaches establish what these models encode but cannot reveal how biological knowledge is internally organized and computed. Sparse autoencoders (SAEs) offer a complementary approach by decomposing model activations into interpretable features, each capturing a distinct biological concept. Over the past year, SAEs have been applied to protein language models, genomic language models, pathology vision transformers, single-cell foundation models, and protein structure generators. Here we provide a systematic review of sparse dictionary learning across biological foundation models. We find that independent studies using different architectures and evaluation strategies consistently recover features spanning biological scales (from secondary structure elements and functional domains in proteins to transcription factor binding sites and regulatory elements in genomes), providing convergent evidence that these models learn interpretable representations accessible through sparse decomposition. However, we identify a critical gap: validation relies almost exclusively on matching features against existing annotations, risking circularity when those annotations derive from the same sequence databases used for model training. We propose a three-level interpretability framework (representational, computational, and causal mechanistic) and argue that the field's most distinctive opportunity lies in experimental validation through deep mutational scanning, massively parallel reporter assays, and structural characterization, which can establish whether these models have learned genuine biological mechanisms rather than training set statistics.

Authors

  • Orlov
  • A. V.; Makus
  • Y. V.; Ashniev
  • G. A.; Orlova
  • N. N.; Nikitin
  • P. I.

Categories