Medication-Stratified Analysis of LDL-C Equation Miscalibration in Diabetes: Evidence from the All of Us Research Program and a Medication-Agnostic Machine-Learning Correction

Journal: medRxiv
Published Date:

Abstract

Standard LDL-C equations were derived in cohorts largely untreated with modern combination diabetes therapies. With medication-treated patients comprising 84% on statins, 53% on insulin, and 25% on GLP-1 receptor agonists—often in combination—we quantified medication-specific miscalibration in LDL-C equations and evaluated a machine learning correction that operates without requiring medication data. Using All of Us Research Program data (n=3,477; test =696), we compared Friedewald, Martin–Hopkins, and Sampson (NIH) Equation 2 against direct LDL-C measurements. We developed a stacked ensemble model (elastic net, random forest, XGBoost, neural network) trained solely on routine laboratory values. Accuracy was assessed within medication groups allowing for combination therapy: insulin users, GLP-1 users, and statin users. Primary endpoints: mean absolute error (MAE) with 95% bootstrap confidence intervals and calibration (ordinary least squares regression of true on predicted LDL-C). Secondary endpoint: Net Reclassification Index at 100 mg/dL. Among 696 test participants, 587 (84%) used statins, 366 (53%) insulin, and 175 (25%) GLP-1 agonists. Patients on triple therapy (insulin+GLP-1+statin) showed the most severe miscalibration: Friedewald slope 0.29, representing 71% compression of the prediction range. In all GLP-1 users (77% also on insulin), standard equations severely underestimated LDL-C with calibration slopes of 0.42–0.48 versus ideal 1.0. Specifically, Friedewald showed slope 0.42 (95% CI 0.27–0.56) with intercept +62 mg/dL; Sampson (NIH) Equation 2 slope 0.48 (0.32–0.64) with intercept +55 mg/dL; Martin–Hopkins slope 0.47 (0.31–0.63) with intercept +55 mg/dL. The machine learning model maintained better calibration (slope 0.83 [0.56–1.09]; intercept −2.2 mg/dL) and reduced MAE by 17% versus Friedewald. Insulin users showed similar improvement: Friedewald slope 0.55 (0.45–0.65) versus the machine learning (ML) model 0.95 (0.78–1.12), with 16% lower error. The medication-by-triglyceride interaction was significant (p=0.002). In patients with insulin exposure and triglycerides ≥200 mg/dL, Net Reclassification Index was 0.240 versus 0.022 overall, indicating greater misclassification risk in hypertriglyceridemia. Standard LDL-C equations systematically underestimate true levels in medication-treated diabetes patients, with errors greatest in combination therapy. A machine learning model trained on routine laboratories—without medication data—achieved near-ideal calibration (slopes 0.83–1.03) and reduced errors by 8–20% across medication groups. These observational findings suggest direct LDL-C measurement or ML-assisted correction should be considered when equation estimates approach treatment thresholds, particularly for patients on combination therapy.

Authors

  • Ronald Doku; Nana Yaw Osafo; John Kwagyan; William M. Southerland