Volumetric Video Compression and Distribution

In recent years, volumetric video has gained a lot of attention and relevance, especially in the realms of virtual reality (VR), augmented reality (AR), virtual productions and media distribution. 

 

To achieve and enhance immersion in these kinds of applications, volumetric video is becoming essential to reach the best quality and deliver the most meaningful experiences. For example, contents that can be seen from any position and angle in 6 degrees of freedom (6DoF) can be used to accurately represent users, or even some parts of the real world. 

Figure 1: An example of a holo-conference application that uses volumetric video to represent and stream participants in real-time.

Efficient Point Cloud Compression for Real-Time Volumetric Video

Point clouds are one of the most common volumetric representation formats, it is widely used because it provides detailed information about volumetric scenes. In other words, point clouds can store high-definition geometrical information such as positions, normals, transformations and colors. And when captured with good quality devices they can represent volumetric content with high accuracy, making them a very good choice for VR applications.

However, there are several shortcomings that need to be addressed. For instance, point clouds in raw format require a huge amount of memory and resources to accurately describe the geometry, colors, normals and all the different attributes of the volume. On top of that, they require a lot of bandwidth for transmission. Furthermore, to achieve high quality, better resolution is required, which implies the need for larger point clouds and therefore higher memory and bandwidth demands. This problem becomes even more complex, when real-time constraints are considered. For this reason, it is mandatory to have efficient mechanisms to compress, deliver and render point clouds, such as the ones developed in the scope of XRECO, whose goal is to reduce these major constraints allowing users and developers to have high quality volumetric video content in real time. The Moving Picture Expert Group (MPEG) has been developing compression standards, such as Video based PCC (V-PCC), obtaining good efficiency in the bandwidth reduction of volumetric data, these standards often fall short in terms of performance. The only real time application available is the V-PCC standard-compliant decoder developed by Nokia, which performs well with offline compressed point clouds representing a single person.

To achieve the goal of having a real time volumetric video encoder and decoder, i2cat has been working on developing an efficient volumetric compression system for interactive immersive applications. Said system takes both the geometrical data and the color details of a given point cloud and stores them as images which can be used as input for traditional video codecs. In other words, the volumetric data is transformed into images that can be understood by a video codec such as H.264 or H.265 and the […]

2024-10-14T09:06:54+00:00October 14, 2024|Technical Insights|

Mixed Reality User Interface

The realm of efficient content retrieval has surged, thanks to the exponential rise in multimedia data. Enter (MR)², a new concept denoting Mixed Reality Multimedia Retrieval. (MR)² , capitalizes on the transformative capacities of Mixed Reality (MR), featuring a live query function that empowers users to initiate queries intuitively through interaction with real-world objects. Within the new framework, we seamlessly integrate cutting-edge technologies such as object detection (YOLOv8), semantic similarity search (CLIP), and data management (Cottontail DB30) within vitrivr. Through autonomous generation of queries based on object recognition in the user’s field of view, (MR)² creates an immersive retrieval of comparable multimedia content from a connected database. This research attempts to redefine the user experience with multimedia databases, harmoniously uniting the physical and digital domains. The success of our iOS prototype application signals promising results, setting the stage for immersive and context-aware multimedia retrieval in the years of MR.
Overview

As technology evolves rapidly, it unveils novel and captivating avenues for interacting with digital data, leading to an overwhelming influx of multimedia content. However, traditional retrieval techniques are needed to help manage this vast data volume. This section delves into the convergence of Artificial Intelligence (AI), Mixed Reality (MR), and multimedia retrieval, culminating in the creation of (MR)²—a transformative concept seamlessly uniting the physical and digital realms.

The motivation behind our research arises from the growing demand for seamless user interactions with multimedia content. Conventional retrieval systems, reliant on text-based queries, often fail to deliver users’ desired immersive experience. In MR environments, our goal is to facilitate effortless engagement with multimedia content by harnessing the capabilities of AI-powered object detection.

To exemplify the potential of (MR)², we present a use case featuring a user in a city centre adorned with an MR headset. Besides menu navigation, our system empowers users to engage directly with historical buildings and recognizing them through object detection. Concentrating on a historical artefact, (MR)², can dynamically provide additional information about it and suggest similar artworks, transforming art exploration into an immersive journey.

Our investigation strives to redefine multimedia retrieval in MR environments through a robust framework integrating AI-driven object detection, XR technologies, and multimedia retrieval. This section introduces (MR)² and illustrates the revolutionary impact of AI-powered live queries on user interactions within MR environments. Foundation

This section delves into multimedia retrieval, object detection, and visual-text co-embedding. Multimedia retrieval focuses on efficient content search across diverse datasets using AI-generated ranked lists. Object detection in mixed reality relies on advanced AI techniques like YOLOv8 for real-time identification. Ultralytics’ YOLOv8 stands out in applications like autonomous driving. As exemplified by CLIP, visual-text co-embedding enhances multimedia retrieval robustness through AI-driven integration of visual and textual features. CLIP’s transformative impact extends to tasks like zero-shot image classification.

Object Detection

Detecting and interacting with physical objects in real-time in mixed reality (MR) environments demands a specialized approach […]

2024-06-17T09:03:23+00:00June 14, 2024|Technical Insights|

FVV Live: Free Viewpoint Videosystem

Explore the revolution in immersive video technology with “FVV Live: Free Viewpoint Video System,”. Developed by the Grupo de Tratamiento de Imágenes (GTI) of the Universidad Politécnica de Madrid (UPM), this groundbreaking technology will transform entertainment and beyond. Discover how this cost-effective, real-time system is breaking barriers in how we experience video, offering limitless perspectives and possibilities for engagement. […]

2024-06-07T16:21:50+00:00February 2, 2024|Technical Insights|
Go to Top