Genosolver: Rare Disease Diagnosis through Holistic Integration of Unstructured Clinical Narratives Using Large Language and Reasoning Models

Journal: medRxiv
Published Date:

Abstract

Background: Molecular medicine has made genetic diagnostics crucial for rare diseases, but the majority of patients remains without diagnosis even after state-of-the-art assessment. Standardized systems for integrating clinical features, such as the Human Phenotype Ontology (HPO), offer assistance, but are often insufficiently detailed and fail to capture crucial clinical parameters such as age at onset, longitudinal changes in symptoms, detailed characteristics of a clinical symptom, or the absence of a feature. Results: We present Genosolver an integrated workflow that utilizes machine learning to address this bottleneck. Using Large Language Models (LLMs) and Large Reasoning Models (LRMs) on unstructured clinical notes and electronic health care data, we generate a workflow that unifies phenotype extraction, generates differential diagnosis, and prioritizes genetic variants from genome data. We evaluated the performance on 233 previously genetically solved cases, where Genosolver ranked the causative gene first in 72% of cases and in 94% of cases in the top 10 gene list, outperforming the existing benchmarking tool Exomiser by 9%. Semi-automated reanalysis of 1,875 unsolved rare disease cases yielded an additional diagnostic rate of 1.7%. Incorporating rich, unstandardized clinical narratives substantially enhanced model performance beyond HPO-only inputs and demonstrated competitive results using data security compliant local models. Conclusion: Integrating unstandardized clinical data with local LLMs and reasoning offers a scalable, data-secure workflow that increases molecular diagnoses in rare diseases.

Authors

  • Islam
  • T.; Danner
  • M.; Ziad
  • Z.; Begemann
  • M.; Beijer
  • D.; Lischka
  • A.; Lausberg
  • E.; Mattern
  • L.; Suh
  • J.; Wittig
  • P.; Guezel
  • N.; Schlaich
  • E.; Karaivanova
  • R.; D'Augello
  • S.; Franken
  • L.; Ruedebusch
  • J.; Mueller
  • R.; Perchalla
  • E.; Zempel
  • H.; Haag
  • N.; Eggermann
  • K.; Eggermann
  • T.; Meyer
  • R.; Kraft
  • F.; Elbracht
  • M.; Kurth
  • I.; Krause
  • J.