A deep learning feature importance test framework for integrating informative high-dimensional biomarkers to improve disease outcome prediction.

Journal: Briefings in bioinformatics
Published Date:

Abstract

Many human diseases result from a complex interplay of behavioral, clinical, and molecular factors. Integrating low-dimensional behavioral and clinical features with high-dimensional molecular profiles can significantly improve disease outcome prediction and diagnosis. However, while some biomarkers are crucial, many lack informative value. To enhance prediction accuracy and understand disease mechanisms, it is essential to integrate relevant features and identify key biomarkers, separating meaningful data from noise and modeling complex associations. To address these challenges, we introduce the High-dimensional Feature Importance Test (HdFIT) framework for machine learning models. HdFIT includes a feature screening step for dimension reduction and leverages machine learning to model complex associations between biomarkers and disease outcomes. It robustly evaluates each feature's impact. Extensive Monte Carlo experiments and a real microbiome study demonstrate HdFIT's efficacy, especially when integrated with advanced models like deep neural networks. Our framework shows significant improvements in identifying crucial features and enhancing prediction accuracy, even in high-dimensional settings.

Authors

  • Baiming Zou
    Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
  • James G Xenakis
    Department of Statistics, Harvard University, Cambridge, MA 02138, United States.
  • Meisheng Xiao
    Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  • Apoena Ribeiro
    School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  • Kimon Divaris
    Department of Pediatric Dentistry and Dental Public Health, Adams School of Dentistry, University of North Carolina-Chapel Hill, Chapel Hill, North Carolina, USA.
  • Di Wu
    University of Melbourne, Melbourne, VIC 3010 Australia.
  • Fei Zou
    Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.