Revealing Treatment Non-Adherence Bias in Clinical Machine Learning Using Large Language Models
Journal:
arXiv
Published Date:
Feb 26, 2025
Abstract
Machine learning systems trained on electronic health records (EHRs)
increasingly guide treatment decisions, but their reliability depends on the
critical assumption that patients follow the prescribed treatments recorded in
EHRs. Using EHR data from 3,623 hypertension patients, we investigate how
treatment non-adherence introduces implicit bias that can fundamentally distort
both causal inference and predictive modeling. By extracting patient adherence
information from clinical notes using a large language model (LLM), we identify
786 patients (21.7%) with medication non-adherence. We further uncover key
demographic and clinical factors associated with non-adherence, as well as
patient-reported reasons including side effects and difficulties obtaining
refills. Our findings demonstrate that this implicit bias can not only reverse
estimated treatment effects, but also degrade model performance by up to 5%
while disproportionately affecting vulnerable populations by exacerbating
disparities in decision outcomes and model error rates. This highlights the
importance of accounting for treatment non-adherence in developing responsible
and equitable clinical machine learning systems.