An agentic framework turns patient-sourced records into a multimodal map of ALS heterogeneity

Journal: bioRxiv
Published Date:

Abstract

ALS shows marked clinical heterogeneity, yet much real-world evidence remains trapped in unstructured reports. Here we introduce MEDSTREM, a large-language-model (LLM)-based agent that converts patient-sourced document images into standardized longitudinal electronic health records, enabling bottom-up cohort building and linkage to trials and multi-omics. By applying MEDSTREM to clinical report images from 8,298 individuals collected via AskHelpU and harmonizing with PRO-ACT and Answer ALS, we generated 17,602 standardized records and multi-omics profiles from 940 induced motor neuron lines. Progression modelling resolved five subtypes and a continuous degeneration score with interpretable anchors: hand-grip strength and forced vital capacity tracked functional loss, and malnutrition emerged as a modifiable correlate. Across RNA-seq and ATAC-seq, clinical severity is aligned with suppression of cell-cycle programmes, declining histone-gene activity and genome-wide chromatin opening, suggesting distinct epigenetic trajectories. These findings establish an agentic AI framework that turns unstructured clinical records into mechanistic insight and links them to multi-omics, reframing ALS studies from top-down, trial-centric analyses to a bottom-up, patient-sourced approach that reveals actionable heterogeneity.

Authors

  • Li
  • Z.; Gao
  • C.; Kong
  • J.; Fu
  • Y.; Wen
  • S.; Li
  • G.; Cao
  • Y.; Fu
  • Y.; Zhang
  • H.; Jia
  • S.; Liu
  • X.; Cai
  • L.; Yan
  • F.; Liu
  • X.; Tian
  • L.

Categories