BAMBI integrates biostatistical and artificial intelligence methods to improve RNA biomarker discovery.

Journal: Briefings in bioinformatics
PMID:

Abstract

RNA biomarkers enable early and precise disease diagnosis, monitoring, and prognosis, facilitating personalized medicine and targeted therapeutic strategies. However, identification of RNA biomarkers is hindered by the challenge of analyzing relatively small yet high-dimensional transcriptomics datasets, typically comprising fewer than 1000 biospecimens but encompassing hundreds of thousands of RNAs, especially noncoding RNAs. This complexity leads to several limitations in existing methods, such as poor reproducibility on independent datasets, inability to directly process omics data, and difficulty in identifying noncoding RNAs as biomarkers. Additionally, these methods often yield results that lack biological interpretation and clinical utility. To overcome these challenges, we present BAMBI (Biostatistical and Artificial-intelligence Methods for Biomarker Identification), a computational tool integrating biostatistical approaches and machine-learning algorithms. By initially reducing high dimensionality through biologically informed statistical methods followed by machine learning-based feature selection, BAMBI significantly enhances the accuracy and clinical utility of identified RNA biomarkers and also includes noncoding RNA biomarkers that existing methods may overlook. BAMBI outperformed existing methods on both real and simulated datasets by identifying individual and panel biomarkers with fewer RNAs while still ensuring superior prediction accuracy. BAMBI was benchmarked on multiple transcriptomics datasets across diseases, including breast cancer, psoriasis, and leukemia. The prognostic biomarkers for acute myeloid leukemia discovered by BAMBI showed significant correlations with patient survival rates in an independent cohort, highlighting its potential for enhancing clinical outcomes. The software is available on GitHub (https://github.com/CZhouLab/BAMBI).

Authors

  • Peng Zhou
    School of International Studies, Zhejiang University, Hangzhou, China.
  • Zixiu Li
    State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Shanghai, China.
  • Feifan Liu
    Department of Quantitative Health Sciences and Radiology, University of Massachusetts Medical School, Worcester, MA, USA.
  • Euijin Kwon
    Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA 01655, United States.
  • Tien-Chan Hsieh
    Division of Hematology-Oncology, Department of Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, United States.
  • Shangyuan Ye
    Biostatistics Shared Resource, Knight Cancer Institute, Oregon Health and Science University, 2720 S Moody Ave, Portland, OR 97201, United States.
  • Shobha Vasudevan
    Brown RNA Center, Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI 02903, United States.
  • Jung Ae Lee
    Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA 01655, United States.
  • Khanh-Van Tran
    Division of Cardiology, Department of Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, United States.
  • Chan Zhou
    Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA 01655, United States.