MIDAA: deep archetypal analysis for interpretable multi-omic data integration based on biological principles.

Journal: Genome biology
PMID:

Abstract

High-throughput multi-omic molecular profiling allows the probing of biological systems at unprecedented resolution. However, integrating and interpreting high-dimensional, sparse, and noisy multimodal datasets remains challenging. Deriving new biological insights with current methods is difficult because they are not rooted in biological principles but prioritise tasks like dimensionality reduction. Here, we introduce a framework that combines archetypal analysis, an approach grounded in biological principles, with deep learning. Using archetypes based on evolutionary trade-offs and Pareto optimality, MIDAA finds extreme data points that define the geometry of the latent space, preserving the complexity of biological interactions while retaining an interpretable output. We demonstrate that these extreme points represent cellular programmes reflecting the underlying biology. Moreover, we show that, compared to alternative methods, MIDAA can identify parsimonious, interpretable, and biologically relevant patterns from real and simulated multi-omics.

Authors

  • Salvatore Milite
    Computational Biology Research Centre, Human Technopole, Milan, Italy. salvatore.milite@fht.org.
  • Giulio Caravagna
    Department of Mathematics, Informatics and Geosciences, University of Trieste, Trieste, Italy. gcaravagna@units.it.
  • Andrea Sottoriva
    Computational Biology Research Centre, Human Technopole, Milan, Italy. andrea.sottoriva@fht.org.