Investigations on using Evidence-Based GraphRag Pipeline using LLM Tailored for USMLE Style Questions

Journal: medRxiv
Published Date:

Abstract

The integration of evidence-based reasoning with retrieval-augmented generation (GraphRAG) holds great promise for enhancing large language model (LLM) question-answering (QA) capabilities. This research proposes a GraphRAG frame-work that improves the interpretability and reliability of LLM-generated answers in the medical domain. Our approach constructs a knowledge graph using Neo4j to represent UMLS medical entities and relationships, and complements it with a vector store of textbook embeddings for dense passage retrieval. The system is designed to combine symbolic reasoning and semantic search to produce more context-aware and evidence-grounded responses. As a proof of concept, we evaluate our system on United States Medical Licensing Examination (USMLE)-style questions, which require clinical reasoning across multiple domains. While overall answer accuracy remains comparable to that of an LLM-only baseline, our system consistently outperforms in citation fidelity — providing richer, more traceable justifications by explicitly linking answers to graph paths and textbook passages. These findings suggest that even when correctness may vary, graph-informed retrieval improves transparency and auditability, which are critical for high-stakes domains like medicine. Our results motivate further refinement of hybrid GraphRAG systems to enhance both factual accuracy and clinical trustworthiness in QA applications.

Authors

  • Tharun Sekar; Kushal; Supprethaa Shankar; Sabah Mohammed; Jinan Fiaidhi