An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.

Journal: JMIR medical informatics
PMID:

Abstract

BACKGROUND: Rare diseases affect millions worldwide but sometimes face limited research focus individually due to low prevalence. Many rare diseases do not have specific International Classification of Diseases, Ninth Edition (ICD-9) and Tenth Edition (ICD-10), codes and therefore cannot be reliably extracted from granular fields like "Diagnosis" and "Problem List" entries, which complicates tasks that require identification of patients with these conditions, including clinical trial recruitment and research efforts. Recent advancements in large language models (LLMs) have shown promise in automating the extraction of medical information, offering the potential to improve medical research, diagnosis, and management. However, most LLMs lack professional medical knowledge, especially concerning specific rare diseases, and cannot effectively manage rare disease data in its various ontological forms, making it unsuitable for these tasks.

Authors

  • Lang Cao
    Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, United States.
  • Jimeng Sun
    College of Computing Georgia Institute of Technology Atlanta, GA, USA.
  • Adam Cross
    Department of Pediatrics, University of Illinois College of Medicine Peoria, Peoria, IL, United States.