One-Shot Medical Video Object Segmentation via Temporal Contrastive Memory Networks
Journal:
arXiv
Published Date:
Mar 19, 2025
Abstract
Video object segmentation is crucial for the efficient analysis of complex
medical video data, yet it faces significant challenges in data availability
and annotation. We introduce the task of one-shot medical video object
segmentation, which requires separating foreground and background pixels
throughout a video given only the mask annotation of the first frame. To
address this problem, we propose a temporal contrastive memory network
comprising image and mask encoders to learn feature representations, a temporal
contrastive memory bank that aligns embeddings from adjacent frames while
pushing apart distant ones to explicitly model inter-frame relationships and
stores these features, and a decoder that fuses encoded image features and
memory readouts for segmentation. We also collect a diverse, multi-source
medical video dataset spanning various modalities and anatomies to benchmark
this task. Extensive experiments demonstrate state-of-the-art performance in
segmenting both seen and unseen structures from a single exemplar, showing
ability to generalize from scarce labels. This highlights the potential to
alleviate annotation burdens for medical video analysis. Code is available at
https://github.com/MedAITech/TCMN.