HumanNeRF: AI just renders 3D humans from a video

HumanNeRF: AI just renders 3D humans from a video
Written by insideindyhomes

Image: Weng et al. | University of Washington | Google | YouTube

The article can only be displayed with activated JavaScript. Please enable JavaScript in your browser and reload the page.

Neural Rendering for Humans: HumanNeRF synthesizes 3D views of humans from a simple YouTube video.

Neural rendering methods promise to expand or even replace tried-and-tested 3D rendering processes with artificial intelligence. An example are the so-called Neural Radiance Fields (NeRFs), small neural networks that can learn 3D representations from 2D photos and then render them.

Since its invention, the technology has produced ever more realistic images. Some variants can now learn and render complex 3D representations in a few seconds. At this year’s GTC, for example, Nvidia gave insights into Instant NeRF, a method that is up to 1000 times faster than old methods.

David Luebke, Vice President of Graphics Research at Nvidia, compared NeRFs to JPEG compression for 2D photography: “If conventional 3D representations such as polygon meshes can be compared to vector images, then NeRFs are like bitmap images. They capture how light radiates from an object or within a scene,” says Luebke.

This allows for a “tremendous increase in the speed, ease and range of recording and sharing 3D.”

Google uses NeRFs for Immersive View

Google is the pioneer of NeRF development. The company developed the NeRFs in cooperation with scientists from UC Berkeley and UC San Diego. Since then, Google has shown AI-rendered street blocks that enable a kind of Street View 3D and photorealistic 3D renderings of real-world objects thanks to Mip-NeRF 360.

At this year’s I/O 2022 developer conference, Google then showed Immersive View, a synthesized 3D perspective of large cities and individual interior views such as restaurants, which use the neural rendering methods developed.

Videos: Google

Now researchers from the University of Washington and Google show how NeRFs can render people in 3D.

NeRFs for humans: movement and clothing a challenge so far

The new method HumanNeRF solves two problems at the same time when representing people with NeRFs: So far, the networks have primarily worked with static objects and relied on camera recordings from several angles.

HumanNeRF, on the other hand, can show moving people, including the movements of their clothing, from previously unseen angles – and all this with training material from a single camera perspective. As a result, the NeRFs can also be trained with a YouTube video in which, for example, a dancing person is filmed from the front.

Video: Weng et al. | University of Washington | Google

HumanNeRF relies on several networks that learn a canonical representation of the person in a so-called T-pose, as well as a so-called motion field (motion field), which learns a rigid skeletal movement and non-rigid movements such as clothing. The pose of the filmed person is additionally captured with a simple pose estimation mesh.


The learned information of the motion field and the pose estimation can then modify the learned canonical representation according to the pose shown in the video and then render from the NeRF.

For Google, HumanNeRF is just the beginning

The method thus allows much more realistic 3D representations than previous methods: the rendered people are more detailed and movements in their clothing can be clearly seen.

In several examples, the researchers show that a single camera angle is sufficient for 3D rendering – use “in the wild” is possible, for example for YouTube videos.

HumanNeRF achieves significantly better results than other methods. | Image: Weng et al.

After the training, HumanNeRF can also display the complete learned scene from the directly opposite perspective – this is particularly challenging since not a single rendered pixel was ever visible in the training.

The researchers cite missing details and perceptible jerking when transitioning between different poses as limitations, since temporal coherence in the movement field is not taken into account.

Technological progress also has its price: the team needed 72 hours for training on four GeForce RTX 2080 Ti GPUs. However, they point to results such as Nvidia’s Instant NGP, which drastically reduces the processing power required for NeRFs and other neural rendering methods.

With some improvements and lower computing requirements, the technology could also reach end users in the long term and offer Google another building block for the AR future, which was clearly drawn at this year’s I/O.

If you want to learn more about Neural Rendering and NeRFs, you can watch our DEEP MINDS episode #8 with Nvidia researcher Thomas Müller. Among other things, Müller is a co-author of Instant-NGP and Instant NeRFs.

Sources: HumanNeRF (project page), Arxiv (paper)

#HumanNeRF #renders #humans #video

About the author


Leave a Comment