Salvaging Forbidden Treasure in Medical Data: Utilizing Surrogate Outcomes and Single Records for Rare Event Modeling
Journal:
arXiv
Published Date:
Jan 25, 2025
Abstract
The vast repositories of Electronic Health Records (EHR) and medical claims
hold untapped potential for studying rare but critical events, such as suicide
attempt. Conventional setups often model suicide attempt as a univariate
outcome and also exclude any ``single-record'' patients with a single
documented encounter due to a lack of historical information. However, patients
who were diagnosed with suicide attempts at the only encounter could, to some
surprise, represent a substantial proportion of all attempt cases in the data,
as high as 70--80%. We innovate a hybrid and integrative learning framework to
leverage concurrent outcomes as surrogates and harness the forbidden yet
precious information from single-record data. Our approach employs a supervised
learning component to learn the latent variables that connect primary (e.g.,
suicide) and surrogate outcomes (e.g., mental disorders) to historical
information. It simultaneously employs an unsupervised learning component to
utilize the single-record data, through the shared latent variables. As such,
our approach offers a general strategy for information integration that is
crucial to modeling rare conditions and events. With hospital inpatient data
from Connecticut, we demonstrate that single-record data and concurrent
diagnoses indeed carry valuable information, and utilizing them can
substantially improve suicide risk modeling.