A Temporal Convolutional Network-Based Approach and a Benchmark Dataset for Colonoscopy Video Temporal Segmentation
Journal:
arXiv
Published Date:
Feb 5, 2025
Abstract
Following recent advancements in computer-aided detection and diagnosis
systems for colonoscopy, the automated reporting of colonoscopy procedures is
set to further revolutionize clinical practice. A crucial yet underexplored
aspect in the development of these systems is the creation of computer vision
models capable of autonomously segmenting full-procedure colonoscopy videos
into anatomical sections and procedural phases. In this work, we aim to create
the first open-access dataset for this task and propose a state-of-the-art
approach, benchmarked against competitive models. We annotated the publicly
available REAL-Colon dataset, consisting of 2.7 million frames from 60 complete
colonoscopy videos, with frame-level labels for anatomical locations and
colonoscopy phases across nine categories. We then present ColonTCN, a
learning-based architecture that employs custom temporal convolutional blocks
designed to efficiently capture long temporal dependencies for the temporal
segmentation of colonoscopy videos. We also propose a dual k-fold
cross-validation evaluation protocol for this benchmark, which includes model
assessment on unseen, multi-center data.ColonTCN achieves state-of-the-art
performance in classification accuracy while maintaining a low parameter count
when evaluated using the two proposed k-fold cross-validation settings,
outperforming competitive models. We report ablation studies to provide
insights into the challenges of this task and highlight the benefits of the
custom temporal convolutional blocks, which enhance learning and improve model
efficiency. We believe that the proposed open-access benchmark and the ColonTCN
approach represent a significant advancement in the temporal segmentation of
colonoscopy procedures, fostering further open-access research to address this
clinical need.