CLIP in medical imaging: A survey.

Journal: Medical image analysis

Published Date: Mar 22, 2025

Abstract

Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training paradigm, successfully introduces text supervision to vision models. It has shown promising results across various tasks due to its generalizability and interpretability. The use of CLIP has recently gained increasing interest in the medical imaging domain, serving as a pre-training paradigm for image-text alignment, or a critical component in diverse clinical tasks. With the aim of facilitating a deeper understanding of this promising direction, this survey offers an in-depth exploration of the CLIP within the domain of medical imaging, regarding both refined CLIP pre-training and CLIP-driven applications. In this paper, we (1) first start with a brief introduction to the fundamentals of CLIP methodology; (2) then investigate the adaptation of CLIP pre-training in the medical imaging domain, focusing on how to optimize CLIP given characteristics of medical images and reports; (3) further explore practical utilization of CLIP pre-trained models in various tasks, including classification, dense prediction, and cross-modal tasks; and (4) finally discuss existing limitations of CLIP in the context of medical imaging, and propose forward-looking directions to address the demands of medical imaging domain. Studies featuring technical and practical value are both investigated. We expect this survey will provide researchers with a holistic understanding of the CLIP paradigm and its potential implications. The project page of this survey can also be found on Github.

Authors

Zihao Zhao

School of Information and Computer, Anhui Agricultural University, Hefei 230036, China.
Yuxiao Liu

School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China.
Han Wu

Department of Thoracic Surgery, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China.
Mei Wang

Natural Products Utilization Research Unit, Agricultural Research Service, U.S. Department of Agriculture, Oxford, MS, 38677, USA.
Yonghao Li

Department of Neuroscience, University of British Columbia, Vancouver, BC, Canada.
Sheng Wang

Intensive Care Medical Center, Tongji Hospital, School of Medicine, Tongji University, Shanghai, 200065, People's Republic of China.
Lin Teng

Software College, Shenyang Normal University, Shenyang 110034, China.
Disheng Liu

School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China.
Zhiming Cui

The Institute of Information Processing and Application, Soochow University, Suzhou 215006, China.
Qian Wang

Department of Radiation Oncology, China-Japan Union Hospital of Jilin University, Changchun, China.
Dinggang Shen

School of Biomedical Engineering, ShanghaiTech University, Shanghai, China.

Keywords

Diagnostic Imaging Humans Natural Language Processing Surveys and Questionnaires

External Resources

View on PubMed Access via DOI PubMed (40127590)

CLIP in medical imaging: A survey.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals