Establishment of two pathomic-based machine learning models to predict CLCA1 expression in colon adenocarcinoma.
Journal:
PloS one
Published Date:
Jul 21, 2025
Abstract
Chloride channel accessory 1 (CLCA1) is considered a potential prognostic biomarker for colon adenocarcinoma (COAD). The objective of this research was to develop two pathomics models to predict CLCA1 expression from hematoxylin-eosin (H&E) stained pathological images and to investigate the biological mechanisms linked to pathomics features by associating the pathomics model with transcriptomic data. The prognostic value of CLCA1 in COAD was assessed based on gene transcriptome expression data. The two pathomics models were constructed to predict CLCA1 expression in COAD based on pathological image features using the random forest (RF) and XGBoost machine learning algorithms. The RF pathomics model demonstrated superior predictive performance, achieving area under the curve (AUC) values of 0.846 and 0.776 in the training and validation cohorts, respectively, and was selected for further analysis. The ability of the pathomics model to predict overall survival (OS) in COAD was determined using univariate and multivariate Cox regression analyses. The possible biological mechanisms behind the pathomics model were explored by conducting gene set variation analysis (GSVA), immune infiltration assessment, and somatic mutation analysis. CLCA1 expression was downregulated in COAD patients and was associated with a poor prognosis (Pā=ā0.008). Participants were categorized into high- and low-risk score groups based on the critical value of the risk score. High-risk scores were protective for OS in COAD in both univariate and multivariate Cox regression analyses. Meanwhile, GSVA enrichment analysis unveiled notable enrichment of pathways such as the epithelial-mesenchymal transition and vascular endothelial growth factor (VEGF) signaling in the low-risk score group. Two pathomics-based machine learning models were developed to predict CLCA1 expression from H&E stained images of COAD. A theoretical basis for interpreting the disease model was developed by comprehensively analyzing the pathomics-based models and transcriptomic data, facilitating further hypothesis-driven experimental research.