SRT-H: A Hierarchical Framework for Autonomous Surgery via Language Conditioned Imitation Learning
Journal:
arXiv
Published Date:
May 15, 2025
Abstract
Research on autonomous robotic surgery has largely focused on simple task
automation in controlled environments. However, real-world surgical
applications require dexterous manipulation over extended time scales while
demanding generalization across diverse variations in human tissue. These
challenges remain difficult to address using existing logic-based or
conventional end-to-end learning strategies. To bridge this gap, we propose a
hierarchical framework for dexterous, long-horizon surgical tasks. Our method
employs a high-level policy for task planning and a low-level policy for
generating task-space controls for the surgical robot. The high-level planner
plans tasks using language, producing task-specific or corrective instructions
that guide the robot at a coarse level. Leveraging language as a planning
modality offers an intuitive and generalizable interface, mirroring how
experienced surgeons instruct traineers during procedures. We validate our
framework in ex-vivo experiments on a complex minimally invasive procedure,
cholecystectomy, and conduct ablative studies to assess key design choices. Our
approach achieves a 100% success rate across n=8 different ex-vivo
gallbladders, operating fully autonomously without human intervention. The
hierarchical approach greatly improves the policy's ability to recover from
suboptimal states that are inevitable in the highly dynamic environment of
realistic surgical applications. This work represents the first demonstration
of step-level autonomy, marking a critical milestone toward autonomous surgical
systems for clinical studies. By advancing generalizable autonomy in surgical
robotics, our approach brings the field closer to real-world deployment.