SUMediPose: A 2D-3D pose estimation dataset.
Journal:
Data in brief
Published Date:
Apr 22, 2025
Abstract
Biomechanical movement analysis is crucial in medical and sports contexts, yet the technology remains expensive and inaccessible to many. Recent advancements in machine learning and computer vision, particularly in Pose Estimation (PE), offer promising alternatives. PE models detect key points on the human body to estimate its pose in either 2D or 3D space, enabling markerless motion capture. This approach facilitates more natural and flexible movement tracking without the need for physical markers. However, markerless systems generally lack the accuracy of marker-based methods and require extensive annotated data for training, which is often anatomically inaccurate. Additionally, current 3D pose estimation techniques face practical challenges, including complex hardware setups, intricate camera calibrations, and a shortage of reliable ground truth 2D-3D datasets. To address these challenges, we introduce a multimodal dataset comprising 3,444 recordings, 2,896,943 image frames, and 3,804,413 corresponding 3D and 2D marker-based motion capture keypoint coordinates. The dataset includes 28 participants performing eight strength and conditioning actions at three different speeds, with full image and keypoint data available for 26 participants, while two participants have only keypoint data without accompanying image data. Video and image data were captured using a custom-developed multi-RGB-camera system, while the marker-based 3D data was acquired using the Vicon system and subsequently projected into each camera's internal coordinate system, represented in both 3D space and 2D image space. The multi-RGB-camera system consists of six cameras arranged in a circular formation around the subject, offering a full 360° view of the scene from the same height and resulting in a diverse set of viewing angles. The recording setup was designed to allow both capture systems to record participants' movements simultaneously, synchronizing the data to provide ground truth 3D data, which was then back-projected to generate 2D-pixel keypoint data for each corresponding image frame. This design enables the dataset to support both 2D and 3D pose estimation tasks. To ensure anatomical accuracy, a professional placed an extensive array of markers on each participant, adhering to industry standards. The dataset also includes all intrinsic and extrinsic camera parameters, as well as origin axis data, necessary for performing any 3D or 2D projections. This allows the dataset to be adjusted and tailored to meet specific research or application needs.
Authors
Keywords
No keywords available for this article.