Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement
Journal:
arXiv
Published Date:
May 3, 2025
Abstract
Surgical workflow recognition is vital for automating tasks, supporting
decision-making, and training novice surgeons, ultimately improving patient
safety and standardizing procedures. However, data corruption can lead to
performance degradation due to issues like occlusion from bleeding or smoke in
surgical scenes and problems with data storage and transmission. In this case,
we explore a robust graph-based multimodal approach to integrating vision and
kinematic data to enhance accuracy and reliability. Vision data captures
dynamic surgical scenes, while kinematic data provides precise movement
information, overcoming limitations of visual recognition under adverse
conditions. We propose a multimodal Graph Representation network with
Adversarial feature Disentanglement (GRAD) for robust surgical workflow
recognition in challenging scenarios with domain shifts or corrupted data.
Specifically, we introduce a Multimodal Disentanglement Graph Network that
captures fine-grained visual information while explicitly modeling the complex
relationships between vision and kinematic embeddings through graph-based
message modeling. To align feature spaces across modalities, we propose a
Vision-Kinematic Adversarial framework that leverages adversarial training to
reduce modality gaps and improve feature consistency. Furthermore, we design a
Contextual Calibrated Decoder, incorporating temporal and contextual priors to
enhance robustness against domain shifts and corrupted data. Extensive
comparative and ablation experiments demonstrate the effectiveness of our model
and proposed modules. Moreover, our robustness experiments show that our method
effectively handles data corruption during storage and transmission, exhibiting
excellent stability and robustness. Our approach aims to advance automated
surgical workflow recognition, addressing the complexities and dynamism
inherent in surgical procedures.