A screening strategy based on machine learning for diagnostic biomarkers in small cell lung cancer.
Journal:
PloS one
Published Date:
Jan 22, 2026
Abstract
Small cell lung cancer (SCLC) is the most aggressive subtype with high mortality rates due to the lack of specific diagnostic biomarkers to delay the optimal opportunity for treatment. Traditional biomarkers, such as neuron-specific enolase (NSE) or pro-gastrin-releasing peptide (ProGRP), have insufficient specificity and sensitivity to meet the demands of clinical diagnosis. Exosome and its contents have become burgeoning cancer biomarkers due to their diverse molecular cargo to achieve intercellular communication. Herein, a novel machine learning strategy was reported for rapid, efficient screening of biomarkers and identified an optimal exosome RNA combination as diagnostic biomarker of SCLC. Firstly, RNA sequencing data from 111 SCLC patients and 362 healthy controls were obtained from the exoRBase 2.0 and 3.0 databases. The machine learning methods were employed to select specific RNA by using 20 iterations with 10-fold nested cross-validation for SCLC diagnosis. Then, an optimal combination of three exosome RNAs (LINC00989, CXCL5, and MAP3K7CL) was confirmed and achieved excellent diagnostic performance (area under the curve (AUC) of 0.950, sensitivity of 0.936, and specificity of 0.892). Finally, an independent validation cohort containing tissue-based RNA expression data for two biomarkers (CXCL5 and MAP3K7CL) from 79 SCLC patients and 7 standard controls was used to evaluate the diagnostic performance of the selected RNAs. The results demonstrated modest diagnostic performance in tissue samples (AUC = 0.718) with two biomarkers, indicating potential cross-tissue applicability despite the limitations of incomplete biomarker coverage. In addition, a specificity analysis of exosome RNA data, including gastric cancer, hepatocellular carcinoma, and breast cancer, demonstrated significant specificity for SCLC. Therefore, the novel biomarker screening strategy integrating nested cross-validation with multiple machine learning algorithms successfully established to offer a potentially valuable protocol for early SCLC diagnosis and other cancers.
Authors
Keywords
No keywords available for this article.