RTMW: A Series of High-Performance AI Models for 2D/3D Whole-Body Pose Estimation


Whole-body pose estimation is a key component for improving the capabilities of human-centric AI systems. It is useful in human-computer interaction, virtual avatar animation, and the film industry. Early research in this field was challenging due to the task’s complexity and limited computational power and data, so, researchers focused on estimating the pose of separate body parts. Systems like OpenPose combined these separate estimations to achieve whole-body pose estimation. However, this method was computationally expensive and had performance limitations. Although lightweight tools like MediaPipe provide good real-time performance and are easy to use, their accuracy still needs improvement.

Current research on these problems includes Top-down Approaches, Coordinate Classification, and 3D Pose Estimation. Top-down algorithms use standard detectors to create bounding boxes and scale the human figure uniformly for pose estimation. These algorithms have performed well in public benchmarks. The two-stage inference method allows the human detector and the pose estimator to use smaller input resolutions. In Coordinate Classification, SimCC introduces an approach that treats keypoint prediction as a classification task for horizontal and vertical coordinates. Lastly, 3D pose estimation is a growing field with many industry applications. It mainly involves two approaches: lifting methods that use 2D key points and regression methods based on image analysis.

Researchers from Shanghai AI Laboratory have proposed RTMW (Real-Time Multi-person Whole-body pose estimation models), a series of high-performance models for estimating 2D/3D whole-body pose. For capturing pose information in a better way from various body parts with different scales, RTMPose model architecture is utilized with FPN and HEM (Hierarchical Encoding Module). The model is trained with a large collection of open-source human datasets with annotations that have manual alignment and are improved using a two-stage distillation technique. RTMW performs strongly on various whole-body pose estimation tests while keeping high inference efficiency and consistent deployment friendliness. 

RTMPose uses various training techniques and adopts the two-stage distillation technology from DWPose during training. Since there are limited open-source whole-body pose estimation datasets, 14 datasets were utilized, aligning the keypoint definitions manually, and uniformly mapping them to the 133-point definition of COCO-Wholebody. Due to the lack of open-source 3D datasets during the pose estimation task of whole-body in the monocular 3D, 14 existing 2D datasets are combined with three open-source 3D datasets for joint training using 17 datasets. These datasets include 3 whole-body datasets, 6 human body datasets, 4 face datasets, 1 hand dataset, and 3 3D whole-body point datasets.

The proposed RTMW model is tested on the whole-body pose estimation task using the COCOWholeBody dataset. The results show that RTMW performs very well, balancing accuracy and complexity. Also, RTMW3D demonstrates good performance on COCOWholeBody. Moreover, the performance of RTMW3D was tested on a set of H3WB, where it achieved a better performance on this dataset. The evaluation of RTMW models’ inference speed is performed. We evaluated the inference speed of RTMW models. Even though RTMW includes an extra module compared to RTMPose, which makes it slightly slower, it significantly improves accuracy.

Researchers from the Shanghai AI Laboratory have introduced RTMW, a series of high-performance models for 2D/3D whole-body pose estimation. In this paper, they have expanded on previous work by examining the complexities and challenges in whole-body pose estimation. The new method, RTMW/RTMW3D, builds on the established RTMPose model for real-time whole-body pose estimation. This method has shown outstanding performance among all open-source alternatives and features unique monocular 3D pose estimation capabilities. In the future, the proposed algorithm and its open-source availability will meet several practical needs in the industry for robust pose estimation solutions.


Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit


Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.





Source link