Integrating language into medical visual recognition and reasoning: A survey.

Journal: Medical image analysis

Published Date: Feb 27, 2025

Abstract

Vision-Language Models (VLMs) are regarded as efficient paradigms that build a bridge between visual perception and textual interpretation. For medical visual tasks, they can benefit from expert observation and physician knowledge extracted from textual context, thereby improving the visual understanding of models. Motivated by the fact that extensive medical reports are commonly attached to medical imaging, medical VLMs have triggered more and more interest, serving not only as self-supervised learning in the pretraining stage but also as a means to introduce auxiliary information into medical visual perception. To strengthen the understanding of such a promising direction, this survey aims to provide an in-depth exploration and review of medical VLMs for various visual recognition and reasoning tasks. Firstly, we present an introduction to medical VLMs. Then, we provide preliminaries and delve into how to exploit language in medical visual tasks from diverse perspectives. Further, we investigate publicly available VLM datasets and discuss the challenges and future perspectives. We expect that the comprehensive discussion about state-of-the-art medical VLMs will make researchers realize their significant potential.

Authors

Yinbin Lu

The School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.
Alan Wang

DeepCyto LLC, West Linn, Oregon, United States.

Keywords

Humans Image Interpretation, Computer-Assisted Language Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40023891)

Integrating language into medical visual recognition and reasoning: A survey.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals