A vision attention driven Language framework for medical report generation.

Journal: Scientific reports

PMID: 40155699

Abstract

This study introduces the Medical Vision Attention Generation (MedVAG) model, a novel framework designed to facilitate the automated generation of medical reports. MedVAG integrates Vision Transformer (ViT)-based visual feature extraction and GPT-2 language modeling, enhanced by graph-based feature fusion and multiple attention mechanisms (co-attention, cross-attention, memory-guided attention), to ensure semantic coherence and diagnostic accuracy. Evaluated on IU X-Ray and COV-CTR datasets, the model achieved state-of-the-art performance across natural language generation metrics (BLEU, METEOR, ROUGE, CIDEr) and clinical effectiveness measures. Ablation studies highlighted the critical role of attention mechanisms and feature fusion in aligning visual and textual features. MedVAG demonstrates strong potential as an assistive technology, aiming to support radiologists by reducing workload and enhancing diagnostic accuracy.

Authors

Merve Varol Arısoy

Bucak Faculty of Computer and Informatics, Information Systems Engineering Department, Burdur Mehmet Akif Ersoy University, Burdur, Turkey. mvarisoy@mehmetakif.edu.tr.
Ayhan Arısoy

Bucak Faculty of Computer and Informatics, Information Systems Engineering Department, Burdur Mehmet Akif Ersoy University, Burdur, Turkey.
İlhan Uysal

Information Systems and Technologies. Depart, Burdur Mehmet Akif Ersoy University, Bucak Zeliha Tolunay School of Applied Technology and Business, Burdur, Turkey.

Keywords

Algorithms Attention Humans Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40155699)

A vision attention driven Language framework for medical report generation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals