Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance
Journal:
arXiv
Published Date:
Feb 28, 2025
Abstract
Precise Event Spotting (PES) aims to identify events and their class from
long, untrimmed videos, particularly in sports. The main objective of PES is to
detect the event at the exact moment it occurs. Existing methods mainly rely on
features from a large pre-trained network, which may not be ideal for the task.
Furthermore, these methods overlook the issue of imbalanced event class
distribution present in the data, negatively impacting performance in
challenging scenarios. This paper demonstrates that an appropriately designed
network, trained end-to-end, can outperform state-of-the-art (SOTA) methods.
Particularly, we propose a network with a convolutional spatial-temporal
feature extractor enhanced with our proposed Adaptive Spatio-Temporal
Refinement Module (ASTRM) and a long-range temporal module. The ASTRM enhances
the features with spatio-temporal information. Meanwhile, the long-range
temporal module helps extract global context from the data by modeling
long-range dependencies. To address the class imbalance issue, we introduce the
Soft Instance Contrastive (SoftIC) loss that promotes feature compactness and
class separation. Extensive experiments show that the proposed method is
efficient and outperforms the SOTA methods, specifically in more challenging
settings.