Improving ACS prediction in T2DM patients by addressing false records in electronic medical records using propensity score.
Journal:
Scientific reports
Published Date:
May 28, 2025
Abstract
Our study aims to improve the prediction performance of machine learning (ML) models by addressing false records (i.e., false positive, false negative, or missingness) in binary categorical variables in electronic medical records (EMRs) using propensity score (PS). This study used the EMRs of patients with type 2 diabetes mellitus (T2DM) treated with basal insulin at a tertiary university hospital in South Korea. We expanded the definition of PS into the probability of having a record for a binary variable given covariates. We calculated PS for the binary categorical variables in their EMRs and developed PS datasets. By utilizing various ML algorithms, we developed and validated ACS prediction models on 80% and 20% of the dataset, respectively. We evaluated model performance using accuracy, recall, precision, F1 score, and AUROC. Additionally, the Shapley Additive Explanation (SHAP) method was used to identify important clinical predictors of ACS. The study included 9,338 patients (with an average age of 60.2 years and 56.6% of whom were male) over 10,184 treatment periods. The most prevalent comorbidities were hypertension (31.5%) and dyslipidemia (28.9%). Notably, 6.9% experienced ACS during their insulin treatment. The ML models trained on PS datasets generally outperformed the models trained on raw datasets. The results of SHAP analysis showed that older age, higher baseline weight, higher baseline glucose, history of antithrombotic therapy, history of chest pain, and indicators of T2DM progression (e.g., senile cataract) were important ACS risk factors. We have developed an ACS prediction model with an improved performance and higher reliance on clinical predictors that are in alignment with current medical understanding.