EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding
Journal:
arXiv
Published Date:
May 30, 2025
Abstract
Operating rooms (ORs) demand precise coordination among surgeons, nurses, and
equipment in a fast-paced, occlusion-heavy environment, necessitating advanced
perception models to enhance safety and efficiency. Existing datasets either
provide partial egocentric views or sparse exocentric multi-view context, but
do not explore the comprehensive combination of both. We introduce EgoExOR, the
first OR dataset and accompanying benchmark to fuse first-person and
third-person perspectives. Spanning 94 minutes (84,553 frames at 15 FPS) of two
emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally
Invasive Spine Surgery, EgoExOR integrates egocentric data (RGB, gaze, hand
tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D
cameras, and ultrasound imagery. Its detailed scene graph annotations, covering
36 entities and 22 relations (568,235 triplets), enable robust modeling of
clinical interactions, supporting tasks like action recognition and
human-centric perception. We evaluate the surgical scene graph generation
performance of two adapted state-of-the-art models and offer a new baseline
that explicitly leverages EgoExOR's multimodal and multi-perspective signals.
This new dataset and benchmark set a new foundation for OR perception, offering
a rich, multimodal resource for next-generation clinical perception.