Weakly Supervised Temporal Convolutional Networks for Fine-Grained Surgical Activity Recognition.

Journal: IEEE transactions on medical imaging
Published Date:

Abstract

Automatic recognition of fine-grained surgical activities, called steps, is a challenging but crucial task for intelligent intra-operative computer assistance. The development of current vision-based activity recognition methods relies heavily on a high volume of manually annotated data. This data is difficult and time-consuming to generate and requires domain-specific knowledge. In this work, we propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition with fewer step annotated videos. We introduce a step-phase dependency loss to exploit the weak supervision signal. We then employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos, for temporal activity segmentation and recognition. We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.

Authors

  • Sanat Ramesh
    Altair Robotics Lab, Department of Computer Science, University of Verona, Verona, Italy. sanat.ramesh@univr.it.
  • Diego Dall'Alba
    University of Verona, Verona, Italy.
  • Cristians Gonzalez
    University Hospital of Strasbourg, IHU Strasbourg, France.
  • Tong Yu
  • Pietro Mascagni
    IHU Strasbourg, Strasbourg, France.
  • Didier Mutter
    Institut Hospitalo-Universitaire, Institute of Image-Guided Surgery, University of Strasbourg, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France3Department of Digestive Surgery, Strasbourg University Hospital, Fédération de Médecin.
  • Jacques Marescaux
  • Paolo Fiorini
    University of Verona, Verona, Italy.
  • Nicolas Padoy
    IHU Strasbourg, Strasbourg, France.