3D Pose Estimation with Media Pipe and OpenPose

chantana chantrapornchai
3 min readDec 23, 2021

Pose estimation has been around for a while and there are many applications on it. We are wondering around the ML Toolkit as we have done previous projects on the facial recognition. We found Google media pipe has provided a good toolkit for ML for various platform.

In this post, we are curious about the pose estimation. The main application is the sports or fitness area. The example provided is interesing which is the pushup counting for pose estimation classification. However, I cannot find the data set for this colab at the moment. I think many people are also curious about the data set like me. This, though, gives us the interesting idea of analyzing the performance of exercising, eg. push up , plank counting, etc.

pose estimation from media pipe
full workout demo with pose estimation

The pose estimation of media pipe propose 21 key points as shown below.

ke ypoints from https://google.github.io/mediapipe/solutions/pose.html

The estimation for each point is 3D which contains x,y,z where z is the relative depth obtained from model estimation based on the center. The model used is based on BlazePose. I pretty like the demo on https://3d.kalidoface.com/.

human center (https://google.github.io/mediapipe/solutions/pose.html)

Compared to OpenPose from CMU, it gives 18 keypoints. For me, the media pipe is versatile, light weight, and pretty easy for installation. OpenPose is active repos and current version is 1.7. The installation much needs more effort and the model is large. However, it provides a good document. It is really heavy for the hardware like Jetson nano and raspberry pi4 to me.

For the demo on the stereo camara, since it has been a state of the arts. There is a demo integration from Intel realsense and ZED Camera.

ZED OpenPose (https://github.com/adujardin/zed-openpose)

However, these are out of date for at least a year or two. You need to do some modification to make them work in the current version of environment. The Cubemose prefers the Windows environment as I have tried on and the speed is not so bad. For ZED openpose it is overkilled on Jetson Nano due to the camera spec and the model itself to me for real-time tracking.

For quick installation, media pipe is my favorite. The bad thing is it can detect only one person in the frame. To make it work with many people in the frame, we need to work on object detection to detect human and cut each bounding box to detect pose. This part is much slower than I though. That is why the demo in google only includes one person per frame for exercising applications. Below is my demo where I render each frame to make a video output. The code demo is below if you are interested. This assumes you have yolov4 installed. You can look at this colab notebook.

YoloV4 + Mediapipe

I have tried to integrate Yolo with mediapipe. By the way, YoloV4 can detect human precisely better than YoloV3. So, my example uses YoloV4 to detect human and cut the frame. I limit the confidence and the size of the human so it won’t detect too many bodies sent for pose estimation. Even so, it is real slow which I don’t think it is suitable for real-time (even on desktop computer).

Openpose makes this possible since it can detect skeletons without the need for human dectection first. However, to run it smoothly, we need the hardware like Jetson TX2 at least. In my case, Jetson Xiaver is very smooth compared to Jetson Nano for real-time stereo camera like ZED.

ZED has a new coming model,.If we have a chance to try some more, we will come back here again. Any comment are welcome.

--

--

chantana chantrapornchai

I love many things about computer system such as system setup, big data & cloud tools, deep learning training, programming in many languages.