Slot-BERT: Self-supervised object discovery in surgical video.

Journal: Medical image analysis

Published Date: Feb 3, 2026

Abstract

Object-centric slot attention is a powerful framework for unsupervised learning of structured and explainable representations that can support reasoning about objects and actions, including in surgical video. However, current object-centric models either fail to reliably capture object dependencies in seconds-long video episodes that encompass surgical actions and tasks or are computationally too expensive for practical implementation. We introduce Slot-BERT, a slot attention model with a temporal slot transformer module to overcome these limitations. Our core innovations are: 1) A bidirectional transformer module that processes object-centric slot representations, enabling longer-range temporal coherence; 2) A slot-contrastive loss that further improves the representation by enforcing slot dissimilarity; 3) We evaluate Slot-BERT on real-world surgical video datasets from abdominal, cholecystectomy, and thoracic procedures, and on real and synthetic videos with everyday objects. Our method surpasses state-of-the-art object-centric approaches under unsupervised training achieving superior performance across these domains. We also demonstrate efficient zero-shot domain adaptation to data from diverse surgical specialties and databases.

Slot-BERT: Self-supervised object discovery in surgical video.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Slot-BERT: Self-supervised object discovery in surgical video.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals