Man standing in virtual environment with skeletal map shown as overlay

Ben Davenport29 December 20214 min read

Body Pose What?

Body Pose Estimation - it’s one of the call out features in our latest 2.1 release of Pixotope, but what is it? It’s a through the lens (TTL) Talent Tracking system that uses AI to generate a 3D skeletal map of data points corresponding to the body and limbs of the on-screen talent.

Why do I need that?

Body Pose Estimation (BPE), without the need for additional hardware, makes Virtual Production easier and/or enables more creativity. BPE makes it such that your real and virtual elements can start to interact with each other. This makes things easier because it means that in more complex virtual environments, talent can move around and not be artificially limited to a spot on stage.

It also means that implementing virtual elements such as shadows or photo-realistic reflections becomes way easier making Virtual Productions look better and enabling more flexibility in virtual studio design - for example in this video, where we’ve added a reflective floor surface in our virtual test stage.

But it also enables more creativity - enabling talent and virtual elements to interact with one another through physics, trigger boxes, hand action, or clicker control for pick up and place logic. In this video example, you can see how we’ve added some plants to our virtual test stage and as Øystein moves through the space, the virtual plants react as real plants would.

How does it work?

Typically, talent tracking systems use multiple cameras, or a stereoscopic setup, to create a 3d model of the talent within the space. The Body Pose Estimation here is different because it is “Through The Lens” - i.e. it only requires the camera you’re using to capture your scene to also track the talent and does not require additional hardware or motion capture devices.

Not only does this reduce complexity, but with the ability in Pixotope to work with video files as well as streams, it also means that should you make changes to your virtual studio or augmented reality elements “post capture” you can have the talent interact with those changed elements too.

To estimate the body position and pose, we utilize NVIDIA Maxine Augmented Reality SDK, an NVIDIA AI-powered framework that provides a real-time set of 3D data points corresponding to the skeleton and body parts of the talent as you can see in the image below.

Are there any limitations?

As with any new technology, there are a few limitations that may change over time. Today, the Body Pose Estimation is limited to one “humanoid” in field of view at a time and if only part of that humanoid body is in field of view, the AI will try to extrapolate the positions of the parts it cannot see. Since the body position is derived from the image, changing that image dramatically - e.g. with a fast pan or zoom, may also have an impact on the accuracy of the skeletal data.

The feature does require quite a bit of GPU capacity and therefore is only available on certain NVIDIA RTX GPUs. In some applications, it could be advantageous to run the BPE feature on a separate workstation to the one doing the compositing of the virtual environment - you can read more about how to do this in our help centre.

What are the alternatives?

As mentioned, this approach is “Through The Lens”, using advanced AI to generate highly accurate estimations of the body position within the 3D space and also the body pose.

Alternatives to this involve additional hardware. There are systems that require the talent to wear a suit that enables systems to track their movements and there are also other estimation systems that use different camera setups to track the talent. These include stereoscopic cameras, which, while compact, are also practically limited to single-person tracking, as the stereoscopic camera cannot estimate body positions if one talent moves behind another.

Where multi-person tracking is needed, a multi-camera set up such as our TalenTrack probably makes more sense. This uses 3 low cost cameras positioned around the volume to map the entire space and then also uses AI to generate 3D skeletal maps of all the talent in the space. In this case, the limitation on the maximum number of talent tracked is the power of the GPU.

How can I try it?

Body Pose Estimation was introduced with version 2.1 of Pixotope which was made available in mid-December.

If you’re an existing user of Pixotope, you can go ahead and look up the details and instructions of how to set it up in the Pixotope Help Center. We’ve also prepared a downloadable project on Pixotope Cloud to get you started which includes a test clip of talent walking around a green screen environment. If you’re not yet a Pixotope user, get in touch with our team.

COMMENTS

Popular Posts

Ben Davenport 18 January 2023 4 min read

What is Real-Time Camera Tracking for Virtual Production?

Chances are, if you’re interested in virtual production, you’ve come across camera tracking and at least vaguely understand its ...

Pixotope 28 March 2022 6 min read

How AR is driving Innovation in Broadcasting

Perhaps a good place to start is why. Why is AR driving innovation in broadcasting? Partly, the answer is, because it can. ...

Jana Amirali 15 October 2025 6 min read

6 Camera Tracking Mistakes in Virtual Production and How to Avoid Them

Camera tracking can make or break your virtual production. When it works, it's invisible magic. When it fails, everyone notices ...