Build fair machine learning models to predict adverse outcomes for Heart failure patients with preserved ejection fraction (HFpEF) and with reduced ejection fraction (HFrEF)
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Heart failure (HF), including heart failure with preserved ejection fraction (HFpEF) and heart failure with reduced ejection fraction (HFrEF), remains a major global health challenge, particularly among aging populations. Timely and accurate prediction of severe adverse outcomes associated with HF is critical for optimizing care, reducing disease burden, and improving outcomes. Although social determinants of health (SDoH) have been recognized as key drivers of HF disparities and associated adverse outcomes, they are rarely integrated into HF prediction models, and fairness in such models remains understudied. To develop and validate fairness-aware machine learning (ML) models incorporating both clinical and SDoH features to predict 6-month readmission or mortality in patients with HFpEF and HFrEF. We conducted a retrospective cohort study using data from the University of Florida (UF) Health electronic health records (EHR). We included adult patients hospitalized for HF from 2016-2022 and followed up for 6 months to identify the incidence of readmission or mortality (a composite outcome). We developed machine learning models using logistic regression (LRC) and XGBoost algorithms, incorporating patients’ clinical characteristics, contextual SDoH (e.g., neighborhood deprivation index), and individual-level SDoH (extracted from clinical notes via natural language processing). Models were trained on balanced datasets using random oversampling, and performance was assessed via C statistic, F1-score, and recall. Fairness was evaluated using false negative rate (FNR) parity across sex, race/ethnicity, and age band. Bias mitigation strategies included Disparate Impact Remover, Adversarial Debiasing, and Calibrated Equalized Odds. HFpEF and HFrEF model that including both clinical and SDoH, achieved C statistic of 0.603 and 0.641, respectively in LRC, while clinical characteristics-only models had lower prediction performance (0.586 and 0.637, respectively). SHAP analysis identified strong predictors of sodium, financial constraint level and emergency department visit count for HFpEF cohort; inpatient visit count, financial constraint level and outpatient visit for HFrEF patients. Fairness assessment showed bias towards in HFpEF population, bias towards Black vs. White (FNRblack / FNRwhite= 0.7834) was increased by Disparate Impact Remover (0.8728). In HFrEF, bias towards Hispanic vs. White (FNRhispanic / FNRwhite= 1.2217) was mitigated by Adversarial Debiasing (0.9880). The prediction models for HFpEF and HFrEF offer an explainable and equity-enhancing approach for risk stratification in personalized HF clinical care. By integrating SDoH, our models’ show improved prediction utility and support targeted interventions for both medical and non-medical needs that are essential for patients’ health outcomes and critical for clinical decision-making.