Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion.

Journal: Medical image analysis
Published Date:

Abstract

Endoscopic Submucosal Dissection (ESD) constitutes a firmly well-established technique within endoscopic resection for the elimination of epithelial lesions. Dissection trajectory prediction in ESD videos has the potential to strengthen surgical skills training and simplify surgical skills training. However, this approach has been seldom explored in previous research. While imitation learning has proven effective in learning skills from expert demonstrations, it encounters difficulties in predicting uncertain future movements, learning geometric symmetries and generalizing to diverse surgical scenarios. This paper introduces imitation learning for the critical task of predicting dissection trajectories from expert video demonstrations. We propose a novel Implicit Diffusion Policy with Equivariant Representations for Imitation Learning (iDPOE) to address this variability. Our method implicitly models expert behaviors using a joint state-action distribution, capturing the inherent stochasticity of future dissection trajectories and enabling robust visual representation learning across various endoscopic views. By incorporating a diffusion model in policy learning, our approach facilitates efficient training and sampling, resulting in more accurate predictions and improved generalization. Additionally, we integrate equivariance into the learning process to enhance the model's ability to generalize to geometric symmetries in trajectory prediction. To enable conditional sampling from the implicit policy, we develop a forward-process guided action inference strategy to correct state mismatches. We evaluated our method using a collected ESD video dataset comprising nearly 2000 clips. Experimental results demonstrate that our approach outperforms both explicit and implicit state-of-the-art methods in trajectory prediction. As far as we know, this is the first endeavor to utilize imitation learning-based techniques for surgical skill learning in terms of dissection trajectory prediction.

Authors

  • Hongyu Wang
    School of Information Science and Technology, Northwest University, Xi'an, Shaanxi, China.
  • Yonghao Long
    Department of Computer Science and Engineering, Ho Sin-Hang Engineering Building, The Chinese University of Hong Kong, Sha Tin, NT, Hong Kong.
  • Yueyao Chen
    Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
  • Hon-Chi Yip
    Division of Upper GI and Metabolic Surgery, Department of Surgery, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China.
  • Markus Scheppach
    Internal Medicine III - Gastroenterology, University Hospital of Augsburg, Augsburg, Germany.
  • Philip Wai-Yan Chiu
    Division of Upper GI and Metabolic Surgery, Department of Surgery, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China.
  • Yeung Yam
    Department of Medicine, Division of Cardiology, University of Ottawa Heart Institute, 40 Ruskin Street, Ottawa, ON K1Y 4W7, Canada.
  • Helen Mei-Ling Meng
    Centre for Perceptual and Interactive Intelligence and The Chinese University of Hong Kong, Hong Kong, China.
  • Qi Dou