CLIN-SUMM: Temporal Summarization of Longitudinal Clinical Notes
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Electronic health records (EHRs) contain years of longitudinal clinical notes that capture evolving patient health, treatments, and outcomes. However, their value is difficult to realize in practice: notes are fragmented across encounters, highly redundant, and inconsistent, leaving temporal patterns underutilized for downstream modeling. We present CLIN-SUMM (Clinical Longitudinal Insight from Notes using Summarization), a framework that uses a large language model (LLM) to consolidate multi-visit EHR narratives into a concise, date-partitioned patient summary. CLIN-SUMM uses a two-prompt architecture to summarize both the first clinical note and subsequent incremental updates, a redundancy filter to skip near-identical notes, and a sliding-window approach to manage long patient histories. In a dementia prediction case study (N=1,500, 120,213 visit notes), CLIN-SUMM achieved up to 85% token reduction while preserving clinical fidelity in an initial clinician review (correctness: 4.84/5, completeness: 4.79/5). Fine-tuning Clinical-ModernBERT on these summaries yielded an Area Under the Receiver Operating Characteristic (AUROC) of 0.79 for classification up to 30 days before diagnosis and 0.75 for prediction three years prior to diagnosis. Analyses of prediction trajectories and saliency patterns revealed clinically meaningful temporal trends and risk markers. CLIN-SUMM can help bridge the gap between unstructured multi-visit narratives and predictive modeling by improving efficiency while surfacing rich temporal patterns and early risk signals, enhancing disease modeling, risk prediction, and other longitudinal reasoning tasks.