PoseNet++: A multi-scale and optimized feature extraction network for high-precision human pose estimation.

Journal: PloS one
Published Date:

Abstract

Human pose estimation (HPE) has made significant progress with deep learning; however, it still faces challenges in handling occlusions, complex poses, and complex multi-person scenarios. To address these issues, we propose PoseNet++, a novel approach based on a 3-stacked hourglass architecture, incorporating three key innovations: the multi-scale spatial pyramid attention hourglass module (MSPAHM), coordinate-channel prior convolutional attention (C-CPCA), and the PinSK Bottleneck Residual Module (PBRM). MSPAHM enhances long-range channel dependencies, enabling the model to better capture structural relationships between limb joints, particularly under occlusion. C-CPCA combines coordinate attention (CA) and channel prior convolutional attention (CPCA) to prioritize keypoints' regions and reduce the confusion in complex multi-person scenarios. The PBRM improves pose estimation accuracy by optimizing the receptive field and convolutional kernel selection, thus enhancing the network's feature extraction capabilities in multi-scale and complex poses. On the MPII validation set, PoseNet++ improves the PCKh score by 3.3% relative to the baseline 3-stacked hourglass network, while reducing the number of model parameters and the number of floating-point operations by 60.3% and 53.1%, respectively. Compared with other mainstream human pose estimation models in recent years, PoseNet++ achieves the state-of-the-art performance on the MPII, LSP, COCO and CrowdPose datasets. At the same time, the model complexity of PoseNet++ is much lower than that of methods with similar accuracy.

Authors

  • Chao Lv
    College of Electronic Information Engineering, Changchun University of Science and Technology, Changchun, China.
  • Geyao Ma
    College of Electronic Information Engineering, Changchun University of Science and Technology, Changchun, China.