Identification and verification of immune and oxidative stress-related diagnostic indicators for malignant lung nodules through WGCNA and machine learning.
Journal:
Scientific reports
Published Date:
Jul 1, 2025
Abstract
Early detection of lung nodules (LNs) is critical for prevention and treatment of lung cancer. However, current noninvasive diagnostic methods face significant challenges in reliably distinguishing benign from malignant nodules. Thus, there is an urgent need for novel molecular biomarkers or pathways to facilitate accurate identification of truly malignant LNs. Using the Gene Expression Omnibus (GEO) database and the "limma" package, we identified differentially expressed genes (DEGs) in lung nodules (LNs) by comparing benign and malignant samples. The oxidative stress-related genes were downloaded from the GenCards database. Subsequently, genes associated with immunity and oxidative stress were analyzed using weighted gene co-expression network analysis (WGCNA). A protein-protein interaction (PPI) network was constructed and hub genes were extracted using 12 centrality-based algorithms in the CytoHubba plugin. Shared DEGs from these analyses were subjected to functional enrichment analysis. To develop a diagnostic model for LNs, we investigated 113 combinations of 12 machine-learning algorithms, employing 10-fold cross-validation on the training set, followed by external validation of the test set. A total of 31 shared differentially expressed genes associated with immunity and oxidative stress were identified, including two hub genes, CDK2 and MCL1. Immune infiltration analysis revealed distinct patterns of immune cell infiltration in malignant LNs compared to those in benign controls. A promising 11-gene diagnostic signature was developed, which exhibited superior performance to existing LNs diagnostic models in both training and testing cohorts. This study developed a diagnostic model for malignant LNs, focusing on the shared genes associated with immunity and oxidative stress pathways. Furthermore, the identified hub genes facilitate a deeper understanding of the pathobiological mechanisms underlying the different types of LNs.