Realistic 3D Worlds with Neural Radiance Fields
In the ever-evolving landscape of computer graphics and artificial intelligence research, a groundbreaking technology gained attention rapidly, and now is shaping the way we perceive and interact with digital imagery.
Neural Radiance Fields (NeRFs) [mild2021nerf], is a cutting-edge approach that promises to revolutionize the field of computer graphics and computer vision. In this blog post, we’ll delve into the fascinating domain of NeRF technologies, exploring principles, applications, and potential implications for various industries.
The Relevance
When you search for images of a monument in the XReco repository, the images can come from different sources, taken at different times of day or with obstacles such as people or objects in the foreground. CERTH’s research and development work in the XReco project wants to solve this problem. The team is focussing on training NeRF algorithms to overcome such challenges and achieve clear results without occlusions in the final trained NeRF. In addition, the trained NeRF can adjust its appearance based on the original lighting conditions in the source images.
Understanding NeRFs
At its core, NeRF is a way of 3D scene representation. Unlike traditional explicit geometric representations, such as voxel grids, triangle meshes, or point clouds, NeRFs leverage the power of Neural Networks to learn a continuous 3D scene representation from 2D images. So, the algorithm is trained to predict the radiance (i.e., outbound color from a surface) and the opacity of a scene at any given 3D point along a ray cast from the camera. This is enabled by an inverse rendering process. Inverse rendering means that we seek to estimate 3D scene parameters (e.g. a camera’s pose) given the 2D rendered image.
So, what do we need to train a NeRF?
The magic behind NeRF lies in its ability to reconstruct detailed 3D scenes from a sparse set of 2D images, capturing intricate lighting effects, surface textures, and object shapes with very high realism. Therefore, we need a collection of images capturing the subject scene and their viewpoint positions and orientation. How to get that you ask? Well, many of the datasets used for training NeRF algorithms are synthetic (e.g. rendered with Blender [blender2024]). When someone needs to learn NeRF representation from real data (real scenes), they can use of-the-self Structure from Motion (SfM) algorithms, such as Colmap [schoen2016sfm], which is used in most of the research works in recent research literature.
Figure 1: Example of a colmap dataset. The camera viewpoints and the estimated pointcloud for the trex scene of the nerf-llff dataset [mild2019local]
How are NeRF algorithms trained?
As already mentioned, an inverse rendering approach is utilized to train NeRF algorithms. Rays are cast from the known camera viewpoint, and […]