Cohort Retrieval using Dense Passage Retrieval
Journal:
arXiv
Published Date:
Jun 26, 2025
Abstract
Patient cohort retrieval is a pivotal task in medical research and clinical
practice, enabling the identification of specific patient groups from extensive
electronic health records (EHRs). In this work, we address the challenge of
cohort retrieval in the echocardiography domain by applying Dense Passage
Retrieval (DPR), a prominent methodology in semantic search. We propose a
systematic approach to transform an echocardiographic EHR dataset of
unstructured nature into a Query-Passage dataset, framing the problem as a
Cohort Retrieval task. Additionally, we design and implement evaluation metrics
inspired by real-world clinical scenarios to rigorously test the models across
diverse retrieval tasks. Furthermore, we present a custom-trained DPR embedding
model that demonstrates superior performance compared to traditional and
off-the-shelf SOTA methods.To our knowledge, this is the first work to apply
DPR for patient cohort retrieval in the echocardiography domain, establishing a
framework that can be adapted to other medical domains.