Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings
Journal:
arXiv
Published Date:
May 3, 2025
Abstract
Automated interpretation of chest X-rays (CXR) is a critical task with the
potential to significantly improve clinical workflow and patient care. While
recent advances in multimodal foundation models have shown promise, effectively
leveraging the full power of large language models (LLMs) for this visual task
remains an underexplored area. This paper introduces CXR-TextInter, a novel
framework that repurposes powerful text-centric LLMs for CXR interpretation by
operating solely on a rich, structured textual representation of the image
content, generated by an upstream image analysis pipeline. We augment this
LLM-centric approach with an integrated medical knowledge module to enhance
clinical reasoning. To facilitate training and evaluation, we developed the
MediInstruct-CXR dataset, containing structured image representations paired
with diverse, clinically relevant instruction-response examples, and the
CXR-ClinEval benchmark for comprehensive assessment across various
interpretation tasks. Extensive experiments on CXR-ClinEval demonstrate that
CXR-TextInter achieves state-of-the-art quantitative performance across
pathology detection, report generation, and visual question answering,
surpassing existing multimodal foundation models. Ablation studies confirm the
critical contribution of the knowledge integration module. Furthermore, blinded
human evaluation by board-certified radiologists shows a significant preference
for the clinical quality of outputs generated by CXR-TextInter. Our work
validates an alternative paradigm for medical image AI, showcasing the
potential of harnessing advanced LLM capabilities when visual information is
effectively structured and domain knowledge is integrated.