Meta-SurDiff: Classification Diffusion Model Optimized by Meta Learning is Reliable for Online Surgical Phase Recognition
Journal:
arXiv
Published Date:
Jun 17, 2025
Abstract
Online surgical phase recognition has drawn great attention most recently due
to its potential downstream applications closely related to human life and
health. Despite deep models have made significant advances in capturing the
discriminative long-term dependency of surgical videos to achieve improved
recognition, they rarely account for exploring and modeling the uncertainty in
surgical videos, which should be crucial for reliable online surgical phase
recognition. We categorize the sources of uncertainty into two types, frame
ambiguity in videos and unbalanced distribution among surgical phases, which
are inevitable in surgical videos. To address this pivot issue, we introduce a
meta-learning-optimized classification diffusion model (Meta-SurDiff), to take
full advantage of the deep generative model and meta-learning in achieving
precise frame-level distribution estimation for reliable online surgical phase
recognition. For coarse recognition caused by ambiguous video frames, we employ
a classification diffusion model to assess the confidence of recognition
results at a finer-grained frame-level instance. For coarse recognition caused
by unbalanced phase distribution, we use a meta-learning based objective to
learn the diffusion model, thus enhancing the robustness of classification
boundaries for different surgical phases.We establish effectiveness of
Meta-SurDiff in online surgical phase recognition through extensive experiments
on five widely used datasets using more than four practical metrics. The
datasets include Cholec80, AutoLaparo, M2Cai16, OphNet, and NurViD, where
OphNet comes from ophthalmic surgeries, NurViD is the daily care dataset, while
the others come from laparoscopic surgeries. We will release the code upon
acceptance.