Scaling up self-supervised learning for improved surgical foundation models.

Journal: Medical image analysis
Published Date:

Abstract

Foundation models have revolutionized computer vision by achieving vastly superior performance across diverse tasks through large-scale pretraining on extensive datasets. However, their application in surgical computer vision has been limited. This study addresses this gap by introducing SurgeNetXL, a novel surgical foundation model that sets a new benchmark in surgical computer vision. Trained on the largest reported surgical dataset to date, comprising over 4.7 million video frames, SurgeNetXL achieves consistent top-tier performance across six datasets spanning four surgical procedures and three tasks, including semantic segmentation, surgical phase recognition, and critical view of safety (CVS) classification. Compared with the best-performing surgical foundation model, SurgeNetXL shows mean improvements of 4.0 %, 8.9 %, and 11.4 % for semantic segmentation, phase recognition, and CVS classification, respectively. Additionally, SurgeNetXL outperforms ImageNet1k by 16.1 %, 8.0 %, and 4.3 % for the respective tasks. In addition to advancing model performance, this study provides key insights into scaling pretraining datasets, extending training durations, and optimizing model architectures specifically for surgical computer vision. These findings pave the way for improved generalization and robustness in data-scarce scenarios, offering a comprehensive framework for future research in this domain. All models and a subset of the SurgeNetXL dataset, including over 2 million video frames, are publicly available at: https://github.com/TimJaspers0801/SurgeNet.

Authors

Keywords

No keywords available for this article.