Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Journal: arXiv

Published Date: May 14, 2025

Abstract

Pre-training on image-text colonoscopy records offers substantial potential for improving endoscopic image analysis, but faces challenges including non-informative background images, complex medical terminology, and ambiguous multi-lesion descriptions. We introduce Endo-CLIP, a novel self-supervised framework that enhances Contrastive Language-Image Pre-training (CLIP) for this domain. Endo-CLIP's three-stage framework--cleansing, attunement, and unification--addresses these challenges by (1) removing background frames, (2) leveraging large language models to extract clinical attributes for fine-grained contrastive learning, and (3) employing patient-level cross-attention to resolve multi-polyp ambiguities. Extensive experiments demonstrate that Endo-CLIP significantly outperforms state-of-the-art pre-training methods in zero-shot and few-shot polyp detection and classification, paving the way for more accurate and clinically relevant endoscopic analysis.

Authors

Yili He
Yan Zhu
Peiyao Fu
Ruijie Yang
Tianyi Chen
Zhihua Wang
Quanlin Li
Pinghong Zhou
Xian Yang
Shuo Wang

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.09435v1)

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals