Saliva-derived transcriptomic signature for gastric cancer detection using machine learning and leveraging publicly available datasets.
Journal:
Scientific reports
Published Date:
May 27, 2025
Abstract
Saliva, a non-invasive, self-collected liquid biopsy, holds promise for early gastric cancer (GC) screening. This study aims to assess the potential of saliva as a proxy for malignant gastric transformation and its diagnostic value through transcriptomic profiling. Leveraging transcriptomic data from the Gene Expression Omnibus (GEO), we constructed and validated predictive models through machine learning algorithms within the tidymodels framework. Tissue-based models were validated on independent tissue datasets, and subsequently applied to saliva. Additionally, an independent saliva-derived model was created and evaluated using sensitivity, specificity, accuracy, area under the curve (AUC), and likelihood ratio (LR) metrics. Tissue-derived models demonstrated excellent performance, with AUC values exceeding 0.9, but did not translate effectively to saliva, suggesting distinct molecular landscapes between tissue and saliva in GC. The saliva-specific model using support vector machine (SVM) achieved the highest performance, with an AUC of 0.87 (95% CI 0.72-0.97), a sensitivity of 0.79 (95% CI 0.58-0.95) and a specificity of 0.70 (95% CI 0.40-0.90). While saliva may not mirror tissue gene expression profile, it represents a promising non-invasive predictive tool for the early detection of GC. Further research is warranted to optimize saliva-derived molecular signatures, increasing their sensitivity and specificity for early cancer detection and advance the use of liquid biopsies in personalized medicine for improved screening, diagnostic and prognostic capabilities.