A dedicated server receives the video and depth streams from the capture nodes and generates the synthetic view according to the viewpoint and camera orientation selected by the user. In principle, if sufficient computing power and network capacity were available, it would be desirable to use as much information about the scene as possible to compute the virtual view. However, even the most powerful computers cannot deal in real time with problems such as multiple stereo vision, so there is little point in transmitting all available data. Recognizing this fact, and considering that real-time operation is a must for us, we dynamically selected the three reference cameras closest to the virtual viewpoint, warped them toward the virtual viewpoint using DIBR techniques, and blended their contributions to produce the synthesized foreground view.
To reduce the amount of data to transmit and process, we note that the color of the background may change over time due to shadows or illumination changes, but not its structure (depth). Consequently, we can afford to generate a detailed model of the background depth during system calibration (offline) using techniques that are too expensive to be used in real time (e.g., Shape from Motion, Multiview Stereo), so that during online operation we only need to send depth information about the foreground. Finally, we synthesize the virtual view using a combination of layers from the background and foreground information to obtain a natural result with reduced computational cost.
FVV Live features low motion-to-photon and end-to-end latencies, enabling seamless free viewpoint navigation and immersive communications. Furthermore, FVV Live’s visual quality has been subjectively evaluated with satisfactory results, along with additional comparative tests demonstrating its preference over state-of-the-art DIBR alternatives.