A Global Visual Information Intervention Model for Medical Visual Question Answering.

Journal: Computers in biology and medicine

Published Date: Apr 28, 2025

Abstract

Medical Visual Question Answering (Med-VQA) aims to furnish precise responses to clinical queries related to medical imagery. While its transformative potential in healthcare is undeniable, current solutions remain nascent and are yet to see widespread clinical adoption. Med-VQA presents heightened complexities compared to standard visual question answering (VQA) tasks due to the myriad of clinical scenarios and the scarcity of labeled medical imagery. This often culminates in language biases and overfitting vulnerabilities. In light of these challenges, this study introduces Global Visual Information Intervention (GVII), an innovative Med-VQA model designed to mitigate language biases and improve model generalizability. GVII is centered on two key branches: the Global Visual Information Branch (GVIB), which extracts and filters holistic visual data to amplify the image's contribution and reduce question dominance, and the Forward Compensation Branch (FCB), which refines multimodal features to counterbalance disruptions introduced by GVIB. These branches work in tandem to enhance predictive accuracy and robustness. Furthermore, a multi-branch fusion mechanism ensures cohesive integration of features and losses across the model. Experimental results demonstrate that the proposed model outperforms existing state-of-the-art models, achieving a 2.6% improvement in accuracy on the PathVQA dataset. In conclusion, the GVII-based Med-VQA model not only successfully mitigates prevalent language biases and overfitting issues but also significantly improves diagnostic precision, offering a considerable stride toward robust, clinically applicable VQA systems.

Authors

Peixi Peng

National and Local Joint Engineering Laboratory of Computer-Aided Design, School of Software Engineering, Dalian University, Dalian, 116622, China.
Wanshu Fan

National and Local Joint Engineering Laboratory of Computer Aided Design, School of Software Engineering, Dalian University, China. Electronic address: fanwanshu@dlu.edu.cn.
Yue Shen

Henan Geology Mineral College, Zhengzhou 451464, China.
Xin Yang

Department of Oral Maxillofacial-Head Neck Oncology, Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, National Clinical Research Center for Oral Diseases, Shanghai Key Laboratory of Stomatology & Shanghai Research Institute of Stomatology, Shanghai, China.
Dongsheng Zhou

National and Local Joint Engineering Laboratory of Computer Aided Design, School of Software Engineering, Dalian University, China; School of Computer Science and Technology, Dalian University of Technology, China. Electronic address: zhouds@dlu.edu.cn.

Keywords

Computer Simulation Diagnostic Imaging Humans Machine Learning

External Resources

View on PubMed Access via DOI PubMed (40294480)

A Global Visual Information Intervention Model for Medical Visual Question Answering.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals