Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium

Published Date: May 22, 2025

Abstract

Large electronic health records (EHR) have been widely implemented and are available for research activities. The magnitude of such databases often requires storage and computing infrastructure that are distributed at different sites. Restrictions on data-sharing due to privacy concerns have been another driving force behind the development of a large class of distributed and/or federated machine learning methods. While missing data problem is also present in distributed EHRs, albeit potentially more complex, distributed multiple imputation (MI) methods have not received as much attention. An important advantage of distributed MI, as well as distributed analysis, is that it allows researchers to borrow information across data sites, mitigating potential fairness issues for minority groups that do not have enough volume at certain sites. In this paper, we propose a communication-efficient and privacy-preserving distributed MI algorithms for variables that are missing not at random.

Authors

Yi Lian

Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada.
Xiaoqian Jiang

School of Biomedical Informatics, University of Texas Health, Science Center at Houston, Houston, TX, USA.
Qi Long

Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, USA.

Keywords

Algorithms Confidentiality Electronic Health Records Humans Machine Learning Privacy

External Resources

View on PubMed PubMed (40417515)

Federated Multiple Imputation for Variables that Are Missing Not At Random in Distributed Electronic Health Records.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals