Mixed Reality User Interface
The realm of efficient content retrieval has surged, thanks to the exponential rise in multimedia data. Enter (MR)², a new concept denoting Mixed Reality Multimedia Retrieval. (MR)² , capitalizes on the transformative capacities of Mixed Reality (MR), featuring a live query function that empowers users to initiate queries intuitively through interaction with real-world objects. Within the new framework, we seamlessly integrate cutting-edge technologies such as object detection (YOLOv8), semantic similarity search (CLIP), and data management (Cottontail DB30) within vitrivr. Through autonomous generation of queries based on object recognition in the user’s field of view, (MR)² creates an immersive retrieval of comparable multimedia content from a connected database. This research attempts to redefine the user experience with multimedia databases, harmoniously uniting the physical and digital domains. The success of our iOS prototype application signals promising results, setting the stage for immersive and context-aware multimedia retrieval in the years of MR.
Overview
As technology evolves rapidly, it unveils novel and captivating avenues for interacting with digital data, leading to an overwhelming influx of multimedia content. However, traditional retrieval techniques are needed to help manage this vast data volume. This section delves into the convergence of Artificial Intelligence (AI), Mixed Reality (MR), and multimedia retrieval, culminating in the creation of (MR)²—a transformative concept seamlessly uniting the physical and digital realms.
The motivation behind our research arises from the growing demand for seamless user interactions with multimedia content. Conventional retrieval systems, reliant on text-based queries, often fail to deliver users’ desired immersive experience. In MR environments, our goal is to facilitate effortless engagement with multimedia content by harnessing the capabilities of AI-powered object detection.
To exemplify the potential of (MR)², we present a use case featuring a user in a city centre adorned with an MR headset. Besides menu navigation, our system empowers users to engage directly with historical buildings and recognizing them through object detection. Concentrating on a historical artefact, (MR)², can dynamically provide additional information about it and suggest similar artworks, transforming art exploration into an immersive journey.
Our investigation strives to redefine multimedia retrieval in MR environments through a robust framework integrating AI-driven object detection, XR technologies, and multimedia retrieval. This section introduces (MR)² and illustrates the revolutionary impact of AI-powered live queries on user interactions within MR environments. Foundation
This section delves into multimedia retrieval, object detection, and visual-text co-embedding. Multimedia retrieval focuses on efficient content search across diverse datasets using AI-generated ranked lists. Object detection in mixed reality relies on advanced AI techniques like YOLOv8 for real-time identification. Ultralytics’ YOLOv8 stands out in applications like autonomous driving. As exemplified by CLIP, visual-text co-embedding enhances multimedia retrieval robustness through AI-driven integration of visual and textual features. CLIP’s transformative impact extends to tasks like zero-shot image classification.
Object Detection
Detecting and interacting with physical objects in real-time in mixed reality (MR) environments demands a specialized approach […]