SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis
Journal:
arXiv
Published Date:
Jun 9, 2025
Abstract
Surgical video understanding is pivotal for enabling automated intraoperative
decision-making, skill assessment, and postoperative quality improvement.
However, progress in developing surgical video foundation models (FMs) remains
hindered by the scarcity of large-scale, diverse datasets for pretraining and
systematic evaluation. In this paper, we introduce \textbf{SurgBench}, a
unified surgical video benchmarking framework comprising a pretraining dataset,
\textbf{SurgBench-P}, and an evaluation benchmark, \textbf{SurgBench-E}.
SurgBench offers extensive coverage of diverse surgical scenarios, with
SurgBench-P encompassing 53 million frames across 22 surgical procedures and 11
specialties, and SurgBench-E providing robust evaluation across six categories
(phase classification, camera motion, tool recognition, disease diagnosis,
action classification, and organ detection) spanning 72 fine-grained tasks.
Extensive experiments reveal that existing video FMs struggle to generalize
across varied surgical video analysis tasks, whereas pretraining on SurgBench-P
yields substantial performance improvements and superior cross-domain
generalization to unseen procedures and modalities. Our dataset and code are
available upon request.