OntologyRAG: Better and Faster Biomedical Code Mapping with Retrieval-Augmented Generation (RAG) Leveraging Ontology Knowledge Graphs and Large Language Models
Journal:
arXiv
Published Date:
Feb 26, 2025
Abstract
Biomedical ontologies, which comprehensively define concepts and relations
for biomedical entities, are crucial for structuring and formalizing
domain-specific information representations. Biomedical code mapping identifies
similarity or equivalence between concepts from different ontologies. Obtaining
high-quality mapping usually relies on automatic generation of unrefined
mapping with ontology domain fine-tuned language models (LMs), followed by
manual selections or corrections by coding experts who have extensive domain
expertise and familiarity with ontology schemas. The LMs usually provide
unrefined code mapping suggestions as a list of candidates without reasoning or
supporting evidence, hence coding experts still need to verify each suggested
candidate against ontology sources to pick the best matches. This is also a
recurring task as ontology sources are updated regularly to incorporate new
research findings. Consequently, the need of regular LM retraining and manual
refinement make code mapping time-consuming and labour intensive. In this work,
we created OntologyRAG, an ontology-enhanced retrieval-augmented generation
(RAG) method that leverages the inductive biases from ontological knowledge
graphs for in-context-learning (ICL) in large language models (LLMs). Our
solution grounds LLMs to knowledge graphs with unrefined mappings between
ontologies and processes questions by generating an interpretable set of
results that include prediction rational with mapping proximity assessment. Our
solution doesn't require re-training LMs, as all ontology updates could be
reflected by updating the knowledge graphs with a standard process. Evaluation
results on a self-curated gold dataset show promises of using our method to
enable coding experts to achieve better and faster code mapping. The code is
available at https://github.com/iqvianlp/ontologyRAG.