A Comprehensive Investigation of Machine Learning Practices in Predictive Modeling of Alzheimer’s Disease and Related Dementias using Multisite Real-world Electronic Health Records

Journal: medRxiv
Published Date:

Abstract

The irreversible progression and profound societal impact of Alzheimer’s disease and related dementias (AD/ADRD) underscore the pressing need for early risk prediction. Machine learning (ML) models integrated with electronic health records (EHRs) offer promising solutions for generating real-world evidence, yet their clinical utility for AD/ADRD remains unclear, both in terms of whether they can be reliably applied and how they can be broadly integrated into healthcare practice. In this study, we systematically investigated widely used ML models for AD/ADRD early prediction using the nationwide All of Us EHRs as the primary cohort, exploring multiple analytic dimensions including cohort construction, feature engineering strategies, and subtype-specific modeling, spanning prediction windows ranging from 10 years to 1 day. To assess clinical applicability in healthcare practice as well as provide actionable insights for cross-cohort model adaptation, we transferred pretrained models from All of Us to external EHR repositories (INSIGHT CRN and OHSU RDW) using different transfer paradigms. Collectively, the findings delineate the strengths and limitations of EHR-based ML models, demonstrating their capability for near-term prediction and interpretability while revealing constraints in long-term estimation, and offer empirical insights that support practical model use with multisite EHRs.

Authors

  • Qiannan Zhang; Zhenxing Xu; Weishen Pan; Hiroko H. Dodge; Jiayu Zhou; Chang Su; Fei Wang