Interventionally-guided representation learning for robust and interpretable AI models in cancer medicine

Journal: bioRxiv
Published Date:

Abstract

Machine learning models hold promise in cancer medicine but often lack robustness and interpretability. We introduce a new class of model for high-dimensional molecular data that incorporate interventional auxiliary information to learn latent representations that are informative and interpretable by design. By using causal signals from genetic loss-of-function screens, our approach generates representations that generalize well across data distributions and biological contexts. In cancer cell line datasets, we show that causal guidance enables “zero-shot” transfer to cancer types unseen during training. Moreover, models trained solely on cell line data translate effectively to clinical cohorts, demonstrating strong “bench-to-bedside” generalization without fine-tuning. This strategy highlights a scalable way to leverage tractable laboratory assays for clinical modeling. More broadly, our results establish how integrating causal biological information within generative frame-works enhances data efficiency, interpretability, and robustness, opening avenues for a new generation of scientifically informed AI models in molecular medicine.

Authors

  • Dom Kirkham; Riccardo Masina; Stephen-John Sammut; Sach Mukherjee; Oscar M. Rueda