Integrating language into medical visual recognition and reasoning: A survey.

Journal: Medical image analysis
Published Date:

Abstract

Vision-Language Models (VLMs) are regarded as efficient paradigms that build a bridge between visual perception and textual interpretation. For medical visual tasks, they can benefit from expert observation and physician knowledge extracted from textual context, thereby improving the visual understanding of models. Motivated by the fact that extensive medical reports are commonly attached to medical imaging, medical VLMs have triggered more and more interest, serving not only as self-supervised learning in the pretraining stage but also as a means to introduce auxiliary information into medical visual perception. To strengthen the understanding of such a promising direction, this survey aims to provide an in-depth exploration and review of medical VLMs for various visual recognition and reasoning tasks. Firstly, we present an introduction to medical VLMs. Then, we provide preliminaries and delve into how to exploit language in medical visual tasks from diverse perspectives. Further, we investigate publicly available VLM datasets and discuss the challenges and future perspectives. We expect that the comprehensive discussion about state-of-the-art medical VLMs will make researchers realize their significant potential.

Authors

  • Yinbin Lu
    The School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.
  • Alan Wang
    DeepCyto LLC, West Linn, Oregon, United States.