EndoViT: pretraining vision transformers on a large collection of endoscopic images.

Journal: International journal of computer assisted radiology and surgery

Published Date: Apr 3, 2024

Abstract

PURPOSE: Automated endoscopy video analysis is essential for assisting surgeons during medical procedures, but it faces challenges due to complex surgical scenes and limited annotated data. Large-scale pretraining has shown great success in natural language processing and computer vision communities in recent years. These approaches reduce the need for annotated data, which is of great interest in the medical domain. In this work, we investigate endoscopy domain-specific self-supervised pretraining on large collections of data.

Authors

Dominik Batić

Chair for Computer Aided Medical Procedures, Technical University Munich, Munich, Germany.
Felix Holm

Technical University Munich, Germany.
Ege Özsoy

Technical University Munich, Germany.
Tobias Czempiel

Technical University Munich, Germany.
Nassir Navab

Chair for Computer Aided Medical Procedures & Augmented Reality, TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.

Keywords

Automation Deep Learning Diagnostic Imaging Endoscopy Humans Large Language Models Minimally Invasive Surgical Procedures Natural Language Processing Thoracic Surgery, Video-Assisted

External Resources

View on PubMed Access via DOI PubMed (38570373)

EndoViT: pretraining vision transformers on a large collection of endoscopic images.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals