Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation
Journal:
arXiv
Published Date:
Jan 7, 2025
Abstract
X-ray image based medical report generation achieves significant progress in
recent years with the help of the large language model, however, these models
have not fully exploited the effective information in visual image regions,
resulting in reports that are linguistically sound but insufficient in
describing key diseases. In this paper, we propose a novel associative
memory-enhanced X-ray report generation model that effectively mimics the
process of professional doctors writing medical reports. It considers both the
mining of global and local visual information and associates historical report
information to better complete the writing of the current report. Specifically,
given an X-ray image, we first utilize a classification model along with its
activation maps to accomplish the mining of visual regions highly associated
with diseases and the learning of disease query tokens. Then, we employ a
visual Hopfield network to establish memory associations for disease-related
tokens, and a report Hopfield network to retrieve report memory information.
This process facilitates the generation of high-quality reports based on a
large language model and achieves state-of-the-art performance on multiple
benchmark datasets, including the IU X-ray, MIMIC-CXR, and Chexpert Plus. The
source code of this work is released on
\url{https://github.com/Event-AHU/Medical_Image_Analysis}.