Dual-modality visual feature flow for medical report generation.

Journal: Medical image analysis
Published Date:

Abstract

Medical report generation, a cross-modal task of generating medical text information, aiming to provide professional descriptions of medical images in clinical language. Despite some methods have made progress, there are still some limitations, including insufficient focus on lesion areas, omission of internal edge features, and difficulty in aligning cross-modal data. To address these issues, we propose Dual-Modality Visual Feature Flow (DMVF) for medical report generation. Firstly, we introduce region-level features based on grid-level features to enhance the method's ability to identify lesions and key areas. Then, we enhance two types of feature flows based on their attributes to prevent the loss of key information, respectively. Finally, we align visual mappings from different visual feature with report textual embeddings through a feature fusion module to perform cross-modal learning. Extensive experiments conducted on four benchmark datasets demonstrate that our approach outperforms the state-of-the-art methods in both natural language generation and clinical efficacy metrics.

Authors

  • Quan Tang
    School of Computer Science, China West Normal University, Nanchong, 637009, Sichuan, China.
  • Liming Xu
    School of Computer Science, China West Normal University, Nanchong City 637009, China. Electronic address: xulm@cwnu.edu.cn.
  • Yongheng Wang
    Xi'an Key Laboratory of Scientific Computation and Applied Statistics, Xi'an 710129, China.
  • Bochuan Zheng
    School of Computer Science, China West Normal University, Nanchong, 637009, Sichuan, China.
  • Jiancheng Lv
    Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, P. R. China.
  • Xianhua Zeng
    Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunication, Chongqing 400065, China. Electronic address: zengxh@cqupt.edu.cn.
  • Weisheng Li
    Chongqing Key Laboratory of Image cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China; College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.