Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models
Journal:
arXiv
Published Date:
May 30, 2025
Abstract
Cardiovascular disease (CVD) risk prediction models are essential for
identifying high-risk individuals and guiding preventive actions. However,
existing models struggle with the challenges of real-world clinical practice as
they oversimplify patient profiles, rely on rigid input schemas, and are
sensitive to distribution shifts. We developed AdaCVD, an adaptable CVD risk
prediction framework built on large language models extensively fine-tuned on
over half a million participants from the UK Biobank. In benchmark comparisons,
AdaCVD surpasses established risk scores and standard machine learning
approaches, achieving state-of-the-art performance. Crucially, for the first
time, it addresses key clinical challenges across three dimensions: it flexibly
incorporates comprehensive yet variable patient information; it seamlessly
integrates both structured data and unstructured text; and it rapidly adapts to
new patient populations using minimal additional data. In stratified analyses,
it demonstrates robust performance across demographic, socioeconomic, and
clinical subgroups, including underrepresented cohorts. AdaCVD offers a
promising path toward more flexible, AI-driven clinical decision support tools
suited to the realities of heterogeneous and dynamic healthcare environments.