Towards real-world monitoring scenarios: An improved point prediction method for crowd counting based on contrastive learning.
Journal:
PloS one
Published Date:
Jul 2, 2025
Abstract
In open environments, complex and variable backgrounds and dense multi-scale targets are two key challenges for crowd counting. Due to the reliance on supervised learning with labeled data, current methods struggle to adapt to crowd detection in complex scenarios when training data is limited; Moreover, detection-based methods may lead to numerous missed detections when dealing with dense, small-scale target groups. This paper proposes a simple yet effective point-based contrastive learning method to alleviate these issues. Initially, we construct contrastive cropped samples and feed them into a convolutional neural network to predict head points of each image patch. Based on the classification and regression loss of these points, we incorporate an auxiliary supervision contrastive learning loss to enhance the model's ability to differentiate between foreground heads and the background. Additionally, a multi-scale feature fusion module is proposed to obtain high-quality feature maps for detecting targets of different scales. Comparative experimental results on public crowd counting datasets demonstrate that the proposed method achieves state-of-the-art performance.