Human Phenotype Ontology (HPO) Mapper: Semantic Mapping of Clinical Findings to the Human Phenotype Ontology Using AI-Powered Embeddings and LLM-Based Quality Control

Journal: medRxiv
Published Date:

Abstract

Structured phenotypic annotations linked to genetic data can drive diagnostic insight and therapeutic discovery in complex diseases. However, poor research access to the rich clinical data trapped in unstructured clinical records remains a significant barrier to phenotype–genotype integration. Here, we present Human Phenotype Ontology (HPO) Mapper, a scalable AI-assisted tool designed to ingest semantically structured clinical findings paired with anatomical region and accurately map them to HPO terms and associated genes. We applied HPO Mapper to two forms of standardised clinical input extracted from inflammatory bowel disease (IBD) patient records. The first data type consisted of paired ‘clinical findings + anatomical regions’ derived from unstructured clinical reports and the second was standardised ICD-10 code-derived phenotypes. HPO Mapper achieved high semantic alignment and mapping accuracy for both data types (F1 = 0.85 ± 0.05 and 0.84 ± 0.03, respectively). Our publicly available tool enables real-time HPO mapping for clinical applications providing a foundation for scalable AI-driven phenotyping across diseases.

Authors

  • Alex Z Kadhim; Zachary Green; Alister Boags; Michael George; Ashley Heinson; Matt Stammers; Christopher M Kipps; R Mark Beattie; Peter N Robinson; James J Ashton; Sarah Ennis