Interpretable machine learning meets systems biology to decode genotype-phenotype maps

Journal: bioRxiv
Published Date:

Abstract

Resolving causal genes from quantitative trait loci (QTL) remains fundamentally limited by linkage disequilibrium. We developed an interpretable machine learning framework that captures higher-order nonlinear genotype-phenotype relationships and allows conditional evaluation of genetic variants, enabling statistical decorrelation of linked loci. Applied to Saccharomyces cerevisiae segregants across chemical stress conditions, our method achieved >75% prediction accuracy and identified known causal genes, including MKT1 (genotoxic stress) and IRA2 (osmotic stress). SHAP-based analysis recovered 56% of the validated pleiotropic genes, compared with 36% by conventional contingency testing. Integration with genome-scale metabolic models revealed pathway enrichments distinguishing high-growing strains, including carbon transport, glycolysis, and oxidative phosphorylation. Notably, gene regulatory network analysis identified a novel function for PDR8 in protein mannosylation and cell wall integrity - functions extending beyond its role in drug resistance. This framework demonstrates that interpretable machine learning, coupled with systems biology, transforms QTL associations into mechanistic biological insight.

Authors

  • Reguna Madhan
  • R. L.; Balaji
  • R.; Sinha
  • H.; Bhatt
  • N.