PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction
Journal:
arXiv
Published Date:
May 26, 2025
Abstract
Long-term Action Quality Assessment (AQA) aims to evaluate the quantitative
performance of actions in long videos. However, existing methods face
challenges due to domain shifts between the pre-trained large-scale action
recognition backbones and the specific AQA task, thereby hindering their
performance. This arises since fine-tuning resource-intensive backbones on
small AQA datasets is impractical. We address this by identifying two levels of
domain shift: task-level, regarding differences in task objectives, and
feature-level, regarding differences in important features. For feature-level
shifts, which are more detrimental, we propose Progressive Hierarchical
Instruction (PHI) with two strategies. First, Gap Minimization Flow (GMF)
leverages flow matching to progressively learn a fast flow path that reduces
the domain gap between initial and desired features across shallow to deep
layers. Additionally, a temporally-enhanced attention module captures
long-range dependencies essential for AQA. Second, List-wise Contrastive
Regularization (LCR) facilitates coarse-to-fine alignment by comprehensively
comparing batch pairs to learn fine-grained cues while mitigating domain shift.
Integrating these modules, PHI offers an effective solution. Experiments
demonstrate that PHI achieves state-of-the-art performance on three
representative long-term AQA datasets, proving its superiority in addressing
the domain shift for long-term AQA.