Dual-modality visual feature flow for medical report generation.

Journal: Medical image analysis

Published Date: Dec 1, 2024

Abstract

Medical report generation, a cross-modal task of generating medical text information, aiming to provide professional descriptions of medical images in clinical language. Despite some methods have made progress, there are still some limitations, including insufficient focus on lesion areas, omission of internal edge features, and difficulty in aligning cross-modal data. To address these issues, we propose Dual-Modality Visual Feature Flow (DMVF) for medical report generation. Firstly, we introduce region-level features based on grid-level features to enhance the method's ability to identify lesions and key areas. Then, we enhance two types of feature flows based on their attributes to prevent the loss of key information, respectively. Finally, we align visual mappings from different visual feature with report textual embeddings through a feature fusion module to perform cross-modal learning. Extensive experiments conducted on four benchmark datasets demonstrate that our approach outperforms the state-of-the-art methods in both natural language generation and clinical efficacy metrics.

Authors

Quan Tang

School of Computer Science, China West Normal University, Nanchong, 637009, Sichuan, China.
Liming Xu

School of Computer Science, China West Normal University, Nanchong City 637009, China. Electronic address: xulm@cwnu.edu.cn.
Yongheng Wang

Xi'an Key Laboratory of Scientific Computation and Applied Statistics, Xi'an 710129, China.
Bochuan Zheng

School of Computer Science, China West Normal University, Nanchong, 637009, Sichuan, China.
Jiancheng Lv

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China.
Xianhua Zeng

Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunication, Chongqing 400065, China. Electronic address: zengxh@cqupt.edu.cn.
Weisheng Li

Chongqing Key Laboratory of Image cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.

Keywords

Algorithms Humans Machine Learning Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (39693718)

Dual-modality visual feature flow for medical report generation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals