Representation Learning to Advance Multi-institutional Studies with Electronic Health Record Data

Journal: arXiv

Published Date: Feb 12, 2025

Abstract

The adoption of EHRs has expanded opportunities to leverage data-driven algorithms in clinical care and research. A major bottleneck in effectively conducting multi-institutional EHR studies is the data heterogeneity across systems with numerous codes that either do not exist or represent different clinical concepts across institutions. The need for data privacy further limits the feasibility of including multi-institutional patient-level data required to study similarities and differences across patient subgroups. To address these challenges, we developed the GAME algorithm. Tested and validated across 7 institutions and 2 languages, GAME integrates data in several levels: (1) at the institutional level with knowledge graphs to establish relationships between codes and existing knowledge sources, providing the medical context for standard codes and their relationship to each other; (2) between institutions, leveraging language models to determine the relationships between institution-specific codes with established standard codes; and (3) quantifying the strength of the relationships between codes using a graph attention network. Jointly trained embeddings are created using transfer and federated learning to preserve data privacy. In this study, we demonstrate the applicability of GAME in selecting relevant features as inputs for AI-driven algorithms in a range of conditions, e.g., heart failure, rheumatoid arthritis. We then highlight the application of GAME harmonized multi-institutional EHR data in a study of Alzheimer's disease outcomes and suicide risk among patients with mental health disorders, without sharing patient-level data outside individual institutions.

Authors

Doudou Zhou
Han Tong
Linshanshan Wang
Suqi Liu
Xin Xiong
Ziming Gan
Romain Griffier
Boris Hejblum
Yun-Chung Liu
Chuan Hong
Clara-Lea Bonzel
Tianrun Cai
Kevin Pan
Yuk-Lam Ho
Lauren Costa
Vidul A. Panickan
J. Michael Gaziano
Kenneth Mandl
Vianney Jouhet
Rodolphe Thiebaut
Zongqi Xia
Kelly Cho
Katherine Liao
Tianxi Cai

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2502.08547v1)

Representation Learning to Advance Multi-institutional Studies with Electronic Health Record Data

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Representation Learning to Advance Multi-institutional Studies with Electronic Health Record Data

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals