Next-generation surgical navigation: Marker-less multi-view 6DoF pose estimation of surgical instruments.
Journal:
Medical image analysis
Published Date:
Jul 1, 2025
Abstract
State-of-the-art research of traditional computer vision is increasingly leveraged in the surgical domain. A particular focus in computer-assisted surgery is to replace marker-based tracking systems for instrument localization with pure image-based 6DoF pose estimation using deep-learning methods. However, state-of-the-art single-view pose estimation methods do not yet meet the accuracy required for surgical navigation. In this context, we investigate the benefits of multi-view setups for highly accurate and occlusion-robust 6DoF pose estimation of surgical instruments and derive recommendations for an ideal camera system that addresses the challenges in the operating room. Our contributions are threefold. First, we present a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured with static and head-mounted cameras and including rich annotations for surgeon, instruments, and patient anatomy. Second, we perform an extensive evaluation of three state-of-the-art single-view and multi-view pose estimation methods, analyzing the impact of camera quantities and positioning, limited real-world data, and static, hybrid, or fully mobile camera setups on the pose accuracy, occlusion robustness, and generalizability. Third, we design a multi-camera system for marker-less surgical instrument tracking, achieving an average position error of 1.01mm and orientation error of 0.89° for a surgical drill, and 2.79mm and 3.33° for a screwdriver under optimal conditions. Our results demonstrate that marker-less tracking of surgical instruments is becoming a feasible alternative to existing marker-based systems.