An agentic framework turns patient-sourced records into a multimodal map of ALS heterogeneity
Journal:
bioRxiv
Published Date:
Feb 3, 2026
Abstract
ALS shows marked clinical heterogeneity, yet much real-world evidence remains trapped in unstructured reports. Here we introduce MEDSTREM, a large-language-model (LLM)-based agent that converts patient-sourced document images into standardized longitudinal electronic health records, enabling bottom-up cohort building and linkage to trials and multi-omics. By applying MEDSTREM to clinical report images from 8,298 individuals collected via AskHelpU and harmonizing with PRO-ACT and Answer ALS, we generated 17,602 standardized records and multi-omics profiles from 940 induced motor neuron lines. Progression modelling resolved five subtypes and a continuous degeneration score with interpretable anchors: hand-grip strength and forced vital capacity tracked functional loss, and malnutrition emerged as a modifiable correlate. Across RNA-seq and ATAC-seq, clinical severity is aligned with suppression of cell-cycle programmes, declining histone-gene activity and genome-wide chromatin opening, suggesting distinct epigenetic trajectories. These findings establish an agentic AI framework that turns unstructured clinical records into mechanistic insight and links them to multi-omics, reframing ALS studies from top-down, trial-centric analyses to a bottom-up, patient-sourced approach that reveals actionable heterogeneity.