Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization.

Journal: Medical image analysis

Published Date: Sep 16, 2024

Abstract

Gastrointestinal endoscopic image analysis presents significant challenges, such as considerable variations in quality due to the challenging in-body imaging environment, the often-subtle nature of abnormalities with low interobserver agreement, and the need for real-time processing. These challenges pose strong requirements on the performance, generalization, robustness and complexity of deep learning-based techniques in such safety-critical applications. While Convolutional Neural Networks (CNNs) have been the go-to architecture for endoscopic image analysis, recent successes of the Transformer architecture in computer vision raise the possibility to update this conclusion. To this end, we evaluate and compare clinically relevant performance, generalization and robustness of state-of-the-art CNNs and Transformers for neoplasia detection in Barrett's esophagus. We have trained and validated several top-performing CNNs and Transformers on a total of 10,208 images (2,079 patients), and tested on a total of 7,118 images (998 patients) across multiple test sets, including a high-quality test set, two internal and two external generalization test sets, and a robustness test set. Furthermore, to expand the scope of the study, we have conducted the performance and robustness comparisons for colonic polyp segmentation (Kvasir-SEG) and angiodysplasia detection (Giana). The results obtained for featured models across a wide range of training set sizes demonstrate that Transformers achieve comparable performance as CNNs on various applications, show comparable or slightly improved generalization capabilities and offer equally strong resilience and robustness against common image corruptions and perturbations. These findings confirm the viability of the Transformer architecture, particularly suited to the dynamic nature of endoscopic video analysis, characterized by fluctuating image quality, appearance and equipment configurations in transition from hospital to hospital. The code is made publicly available at: https://github.com/BONS-AI-VCA-AMC/Endoscopy-CNNs-vs-Transformers.

Authors

Carolus H J Kusters

Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
Tim J M Jaspers

Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands. Electronic address: t.j.m.jaspers@tue.nl.
Tim G W Boers

Department of Electrical Engineering, Video Coding & Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands.
Martijn R Jong

Department of Gastroenterology and Hepatology, Amsterdam Gastroenterology, Endocrinology and Metabolism, University of Amsterdam, Amsterdam, the Netherlands.
Jelmer B Jukema

Department of Gastroenterology and Hepatology, Amsterdam Gastroenterology, Endocrinology and Metabolism, University of Amsterdam, Amsterdam, the Netherlands.
Kiki N Fockens

Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands.
Albert J de Groof

Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands.
Jacques J Bergman

Department of Gastroenterology and Hepatology, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105 AZ Amsterdam, the Netherlands.
Fons van der Sommen

VCA Research Group, Eindhoven University of Technology, Eindhoven, The Netherlands.
Peter H N de With

Eindhoven University of Technology, 5612 AJ, Eindhoven, The Netherlands.

Keywords

Barrett Esophagus Deep Learning Endoscopy, Gastrointestinal Esophageal Neoplasms Humans Image Interpretation, Computer-Assisted Image Processing, Computer-Assisted Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (39298861)

Will Transformers change gastrointestinal endoscopic image analysis? A comparative analysis between CNNs and Transformers, in terms of performance, robustness and generalization.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals