Genetic Features for Drug Responses in Cancer -- Investigating an Ensemble-Feature-Selection Approach
Journal:
arXiv
Published Date:
Jul 3, 2025
Abstract
Predicting drug responses using genetic and transcriptomic features is
crucial for enhancing personalized medicine. In this study, we implemented an
ensemble of machine learning algorithms to analyze the correlation between
genetic and transcriptomic features of cancer cell lines and IC50 values, a
reliable metric for drug efficacy. Our analysis involved a reduction of the
feature set from an original pool of 38,977 features, demonstrating a strong
linear relationship between genetic features and drug responses across various
algorithms, including SVR, Linear Regression, and Ridge Regression. Notably,
copy number variations (CNVs) emerged as more predictive than mutations,
suggesting a significant reevaluation of biomarkers for drug response
prediction. Through rigorous statistical methods, we identified a highly
reduced set of 421 critical features. This set offers a novel perspective that
contrasts with traditional cancer driver genes, underscoring the potential for
these biomarkers in designing targeted therapies. Furthermore, our findings
advocate for IC50 values as a predictable measurement of drug responses and
underscore the need for more data that can represent the dimensionality of
genomic data in drug response prediction. Future work will aim to expand the
dataset and refine feature selection to enhance the generalizability of the
predictive model in clinical settings.