Towards real-world monitoring scenarios: An improved point prediction method for crowd counting based on contrastive learning.

Journal: PloS one
Published Date:

Abstract

In open environments, complex and variable backgrounds and dense multi-scale targets are two key challenges for crowd counting. Due to the reliance on supervised learning with labeled data, current methods struggle to adapt to crowd detection in complex scenarios when training data is limited; Moreover, detection-based methods may lead to numerous missed detections when dealing with dense, small-scale target groups. This paper proposes a simple yet effective point-based contrastive learning method to alleviate these issues. Initially, we construct contrastive cropped samples and feed them into a convolutional neural network to predict head points of each image patch. Based on the classification and regression loss of these points, we incorporate an auxiliary supervision contrastive learning loss to enhance the model's ability to differentiate between foreground heads and the background. Additionally, a multi-scale feature fusion module is proposed to obtain high-quality feature maps for detecting targets of different scales. Comparative experimental results on public crowd counting datasets demonstrate that the proposed method achieves state-of-the-art performance.

Authors

  • Rundong Cao
    China Tower Corporation Limited, Beijing, China.
  • Jiazhong Yu
    China Tower Corporation Limited, Beijing, China.
  • Ziwei Liu
    College of Food Science and Engineering, Northwest University, Xi'an 710069, China.
  • QingHua Liang
    School of Mechanical Engineering, Shanghai Jiao Tong University, Room 901, Dongchuan Road 800, Minhang District, Shanghai, 200240, China. qhliang@sjtu.edu.cn.