Enhancing Knowledge Graph Construction: Evaluating with Emphasis on Hallucination, Omission, and Graph Similarity Metrics
Journal:
arXiv
Published Date:
Feb 7, 2025
Abstract
Recent advancements in large language models have demonstrated significant
potential in the automated construction of knowledge graphs from unstructured
text. This paper builds upon our previous work [16], which evaluated various
models using metrics like precision, recall, F1 score, triple matching, and
graph matching, and introduces a refined approach to address the critical
issues of hallucination and omission. We propose an enhanced evaluation
framework incorporating BERTScore for graph similarity, setting a practical
threshold of 95% for graph matching. Our experiments focus on the Mistral
model, comparing its original and fine-tuned versions in zero-shot and few-shot
settings. We further extend our experiments using examples from the KELM-sub
training dataset, illustrating that the fine-tuned model significantly improves
knowledge graph construction accuracy while reducing the exact hallucination
and omission. However, our findings also reveal that the fine-tuned models
perform worse in generalization tasks on the KELM-sub dataset. This study
underscores the importance of comprehensive evaluation metrics in advancing the
state-of-the-art in knowledge graph construction from textual data.