From Knowledge Generation to Knowledge Verification: Examining the BioMedical Generative Capabilities of ChatGPT
Journal:
arXiv
Published Date:
Feb 20, 2025
Abstract
The generative capabilities of LLM models offer opportunities for
accelerating tasks but raise concerns about the authenticity of the knowledge
they produce. To address these concerns, we present a computational approach
that evaluates the factual accuracy of biomedical knowledge generated by an
LLM. Our approach consists of two processes: generating disease-centric
associations and verifying these associations using the semantic framework of
biomedical ontologies. Using ChatGPT as the selected LLM, we designed
prompt-engineering processes to establish linkages between diseases and related
drugs, symptoms, and genes, and assessed consistency across multiple ChatGPT
models (e.g., GPT-turbo, GPT-4, etc.). Experimental results demonstrate high
accuracy in identifying disease terms (88%-97%), drug names (90%-91%), and
genetic information (88%-98%). However, symptom term identification was notably
lower (49%-61%), due to the informal and verbose nature of symptom
descriptions, which hindered effective semantic matching with the formal
language of specialized ontologies. Verification of associations reveals
literature coverage rates of 89%-91% for disease-drug and disease-gene pairs,
while symptom-related associations exhibit lower coverage (49%-62%).