Challenges and recommendations for Electronic Health Records data extraction and preparation for dynamic prediction modelling in hospitalized patients -- a practical guide
Journal:
arXiv
Published Date:
Jan 17, 2025
Abstract
Dynamic predictive modelling using electronic health record (EHR) data has
gained significant attention in recent years. The reliability and
trustworthiness of such models depend heavily on the quality of the underlying
data, which is, in part, determined by the stages preceding the model
development: data extraction from EHR systems and data preparation. In this
article, we identified over forty challenges encountered during these stages
and provide actionable recommendations for addressing them. These challenges
are organized into four categories: cohort definition, outcome definition,
feature engineering, and data cleaning. This comprehensive list serves as a
practical guide for data extraction engineers and researchers, promoting best
practices and improving the quality and real-world applicability of dynamic
prediction models in clinical settings.