Benchmarking ensemble machine learning algorithms for multi-class, multi-omics data integration in clinical outcome prediction.

Journal: Briefings in bioinformatics

PMID: 40116658

Abstract

The complementary information found in different modalities of patient data can aid in more accurate modelling of a patient's disease state and a better understanding of the underlying biological processes of a disease. However, the analysis of multi-modal, multi-omics data presents many challenges. In this work, we compare the performance of a variety of ensemble machine learning (ML) algorithms that are capable of late integration of multi-class data from different modalities. The ensemble methods and their variations tested were (i) a voting ensemble, with hard and soft vote, (ii) a meta learner, and (iii) a multi-modal AdaBoost model using hard vote, soft vote, and meta learner to integrate the modalities on each boosting round, the PB-MVBoost model and a novel application of a mixture of expert's model. These were compared to simple concatenation. We examine these methods using data from an in-house study on hepatocellular carcinoma, plus validation datasets on studies from breast cancer and irritable bowel disease. We develop models that achieve an area under the receiver operating curve of up to 0.85 and find that two boosted methods, PB-MVBoost and AdaBoost with soft vote were the best performing models. We also examine the stability of features selected and the size of the clinical signature. Our work shows that integrating complementary omics and data modalities with effective ensemble ML models enhances accuracy in multi-class clinical outcome predictions and produces more stable predictive features than individual modalities or simple concatenation. We provide recommendations for the integration of multi-modal multi-class data.

Authors

Annette Spooner

School of Computer Science and Engineering, UNSW Sydney, Sydney, Australia. a.spooner@unsw.edu.au.
Mohammad Karimi Moridani

School of Biotechnology and Biomolecular Sciences, University of New South Wales, NSW 2052, Australia.
Barbra Toplis

St George and Sutherland Clinical Campuses, University of New South Wales, Short St, Kogarah, NSW 2217, Australia.
Jason Behary

St George and Sutherland Clinical Campuses, University of New South Wales, Sydney, New South Wales, Australia.
Azadeh Safarchi

Health and Biosecurity, Microbiome for One System Health, Commonwealth Scientific and Industrial Research Organisation, 160 Hawkesbury Rd, Westmead, NSW 2145, Australia.
Salim Maher

St George and Sutherland Clinical Campuses, University of New South Wales, Short St, Kogarah, NSW 2217, Australia.
Fatemeh Vafaee

School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia.
Amany Zekry

St George and Sutherland Clinical Campuses, University of New South Wales, Sydney, New South Wales, Australia.
Arcot Sowmya

School of Computer Science and Engineering, UNSW Sydney, Sydney, Australia.

Keywords

Algorithms Benchmarking Breast Neoplasms Carcinoma, Hepatocellular Female Genomics Humans Liver Neoplasms Machine Learning Multiomics

External Resources

View on PubMed Access via DOI PubMed (40116658)

Benchmarking ensemble machine learning algorithms for multi-class, multi-omics data integration in clinical outcome prediction.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals