A robust machine learning model based on ribosomal-subunit-derived piRNAs for diagnostic potential of nonsmall cell lung cancer across multicentre, large-scale of sequencing data.
Journal:
Clinical and translational medicine
Published Date:
Aug 1, 2025
Abstract
Nonsmall cell lung cancer (NSCLC) is a lethal cancer and lacks robust biomarkers for noninvasive clinical diagnosis. Detecting NSCLC at the early stage can decrease the mortality rate and minimise harm caused by various treatments. We curated 2050 samples from public tissue and plasma datasets including both invasive and noninvasive types, then supplemented with in-house pooled plasma and exosome samples. Eleven independent transcriptome datasets were utilised to develop a new machine learning model by integrating PIWI-interacting RNA (piRNA) to predict NSCLC. Five piRNA signatures derived from ribosomal subunits identified to be tumour-specific exhibited robust diagnostic ability and were combined into a piRNA-Based Tumour Probability Index (pi-TPI) risk evaluation model. pi-TPI effectively distinguished NSCLC patients from healthy individuals and showed efficacy in identifying early-stage cancers with Area under the ROC Curve (AUC) values over .80. Plasma cohorts exhibited the diagnosis efficacy of pi-TPI with an AUC value of .85. Experimental exosomal data enhances the accuracy of diagnosing noncancerous, benign, and cancer cases. The pi-TPI marker in the noncancer/cancer subgroup exhibited superior predictive performance with an AUC value of .96. These findings underscore the significant clinical potential of the five piRNA signatures as a powerful diagnostic tool for NSCLC, particularly of noninvasive cancer diagnostics.