Machine learning for integrative multi-omics clustering and feature gene identification
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Multi-omics integrative analysis is pivotal for elucidating complex molecular mechanisms and biological processes, yet remains challenging in multi-omics data integration and feature selection. Here we present MIA, a machine learning framework for multi-omics integrative analysis that features unsupervised sample clustering via tensor decomposition and Fuzzy C-Means and supervised feature selection through an enhanced random forest. Benchmarking on simulated datasets demonstrates that MIA achieves higher accuracy in clustering and feature identification by comparison with extant algorithms. Furthermore, in TCGA empirical datasets, MIA is more effective to cluster samples and identify feature genes with significantly clinical outcomes. Particularly, as applied to glioblastoma, MIA is capable to identify three previously uncharacterized subtypes that exhibit more pronounced differences in survival patterns and powerful to uncover 167 feature genes that contribute to glioblastoma subtyping and associate closely with sensitivity of Temozolomide treatment. Collectively, these results establish MIA as a generalizable framework for multi-omics integrative analysis, enabling systematic molecular subtyping and significant feature discovery across complex biological systems.