A human pose estimation network based on YOLOv8 framework with efficient multi-scale receptive field and expanded feature pyramid network.
Journal:
Scientific reports
PMID:
40312474
Abstract
Deep neural networks are used to accurately detect, estimate, and predict human body poses in images or videos through deep learning-based human pose estimation. However, traditional multi-person pose estimation methods face challenges due to partial occlusions and overlaps between multiple human bodies and body parts. To address these issues, we propose EE-YOLOv8, a human pose estimation network based on the YOLOv8 framework, which integrates Efficient Multi-scale Receptive Field (EMRF) and Expanded Feature Pyramid Network (EFPN). First, the EMRF module is employed to further enhance the model's feature representation capability. Second, the EFPN optimizes cross-level information exchange and improves multi-scale data integration. Finally, Wise-IoU replaces the traditional Intersection over Union (IoU) to improve detection accuracy through precise overlap measurement between predicted and ground-truth bounding boxes. We evaluate EE-YOLOv8 on the MS COCO 2017 dataset. Compared to YOLOv8-Pose, EE-YOLOv8 achieves an AP of 89.0% at an IoU threshold of 0.5 (an improvement of 3.3%) and an AP of 65.6% over the IoU range of 0.5-0.95 (an improvement of 5.8%). Therefore, EE-YOLOv8 achieves the highest accuracy while maintaining the lowest parameter count and computational complexity among all analyzed algorithms. These results demonstrate that EE-YOLOv8 exhibits superior competitiveness compared to other mainstream methods.