Stacking ensemble learning models diagnose pulmonary infections using host transcriptome data from metatranscriptomics.

Journal: Scientific reports
Published Date:

Abstract

The prompt diagnosis of pulmonary infections with unknown etiology in patients in severe condition remains a challenge due to the lack of rapid and effective diagnostic methods. While metatranscriptomic sequencing offers a powerful approach, its clinical utility is often limited by issues of timeliness. In this study, we conducted metatranscriptomic sequencing on bronchoalveolar lavage fluid (BALF) collected from critically ill, severely ill, and ICU patients. Based on microbial detection results, patients were classified into four types: negative, bacterial infection, viral infection, and fungal infection. To identify host gene expression signatures associated with infection, we screened characteristic genes from human metatranscriptomic data by comparing 70% of patients with confirmed infections vs. non-infections. Leveraging these characteristic genes, we constructed classification sub-models employing 13 types of machine learning algorithms, and we further integrated these sub-models into stacking-based ensemble models with Lasso regression, resulting in diagnostic models that required only a small set of gene expression inputs. The average performance of five-fold cross-validation demonstrated high diagnostic accuracy: distinguishing infection from non-infection (AUC = 0.984), bacterial infection from non-bacterial infection (AUC = 0.98), and viral infection from non- viral infection (AUC = 0.98). Test cohorts' results demonstrated the method's high diagnostic accuracy consistency with metatranscriptomic sequencing in discerning patient infection status (AUC = 0.865) and the type of infection (viral: AUC = 0.934, bacterial: AUC = 0.871). Our study presented a rapid and inexpensive adjunctive diagnostic strategy that achieves diagnostic accuracy comparable to metatranscriptomic sequencing, enabling timely identification of both infection status and type in pulmonary infections.

Authors

  • Tian Zhang
    School of Medicine, Vanderbilt University, Nashville, TN, United States.
  • Ying Deng
    College of Geography and Environmental Sciences, Zhejiang Normal University, Jinhua 321004, China.
  • Wentao Wang
    Department of Radiation Oncology, Duke University Medical Center, Durham, NC, United States.
  • Zhe Zhao
    Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, United States.
  • Yiling Wu
    Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, Guangzhou 510631, China.
  • Haoqian Wang
  • Shutao Xia
    Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China; Peng Cheng Laboratory, Shenzhen 518055, China. Electronic address: xiast@sz.tsinghua.edu.cn.
  • Weifang Liao
    College of life science and technology, Wuhan Polytechnic University, Wuhan, People's Republic of China. leesalwf89@126.com.
  • Weijie Liao
    Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, People's Republic of China. liaoweijie@szu.edu.cn.