Relationship Between Gene Expression and Drug Response in Triple-Negative Breast Cancer: Leveraging Single-Cell RNA Sequencing and Machine Learning to Identify Biomarker Profiles
Journal:
bioRxiv
Published Date:
Mar 8, 2026
Abstract
Triple-negative breast cancer (TNBC) is an aggressive subtype characterized by limited therapeutic options and poor prognosis. To address these challenges, we combined single-cell RNA sequencing (scRNA-seq) data with advanced machine learning techniques to find biomarkers that predict treatment response. Using tumor and blood samples from TNBC patients treated with either paclitaxel alone or in combination with atezolizumab, we constructed cellular maps, identified differentially expressed genes, and analyzed co-expression networks. Copy number variation (CNV) profiling and weighted gene co-expression network analysis (WGCNA) identified immune-related genes, including IL7R, CD6, and TNFAIP3, as candidate biomarkers linked to therapeutic response. A feature selection using Random Forest followed by bootstrap-enhanced K-nearest neighbors (K-NN) classification achieved high predictive accuracy across all subgroups (AUC > 0.93). Local Interpretable Model-Agnostic Explanations (LIME) based interpretability further identified important factors influencing treatment sensitivity and resistance, including EGR1, MKI67, C1QA/B/C, GZMB, and PRF1. Notably, blood-derived biomarkers showed strong predictive potential, highlighting the variability of liquid biopsy approaches for non-invasive monitoring. Our results indicate that integration of scRNA-seq with interpretable machine learning facilitates the identification of reliable biomarkers and yields clinically actionable insights for personalized therapeutic strategies in TNBC.