Cofopose: Conditional 2D Pose Estimation with Transformers.

Journal: Sensors (Basel, Switzerland)
Published Date:

Abstract

Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually inherently ambiguous in challenging cases such as motion blur, occlusions, and truncation, leading to poor performance measurement and lower levels of accuracy. In this paper, we propose Cofopose, which is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation. Cofopose is composed of conditional cross-attention, a conditional DEtection TRansformer (conditional DETR), and an encoder-decoder in the transformer framework; this allows it to achieve person and keypoint detection. In a significant departure from other approaches, we use conditional cross-attention and fine-tune conditional DETR for our person detection, and encoder-decoders in the transformers for our keypoint detection. Cofopose was extensively evaluated using two benchmark datasets, MS COCO and MPII, achieving an improved performance with significant margins over the existing state-of-the-art frameworks.

Authors

  • Evans Aidoo
    School of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China.
  • Xun Wang
    College of Computer Science and Technology, China University of Petroleum, Dongying, China.
  • Zhenguang Liu
    School of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China.
  • Edwin Kwadwo Tenagyei
    School of Information & Software Engineering, University of Electronic Science & Technology of China, Chengdu 611731, China.
  • Kwabena Owusu-Agyemang
    Department of Computer Science, Kwame Nkrumah University of Science and Technology (KNUST), Kumasi 03220, Ghana.
  • Seth Larweh Kodjiku
    School of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China.
  • Victor Nonso Ejianya
    School of Computer & Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China.
  • Esther Stacy E B Aggrey
    School of Information & Software Engineering, University of Electronic Science & Technology of China, Chengdu 611731, China.