Exploring Seamless Data Integration with XReco's Metasearch Service

Eviden introduces a pioneering metasearch service for the XReco project, designed to significantly enhance how users access and interact with information from multiple sources. This advanced service consolidates search results from various internal and public sources into a single, streamlined interface, facilitating a more efficient and comprehensive search experience. Utilizing artificial intelligence, the service further refines these results, re-ranking them based on relevance to deliver superior search outcomes that align closely with user queries.

Why a Metasearch?

In the realm of Extended Reality (XR), where users frequently seek 3D models, animations, textures, music and sound effects, data is often scattered across multiple repositories. Unlike the more consolidated searching we see with images, there is no single repository that effectively filters and retrieves the most commonly used 3D object repositories. The metasearch concept was developed to address this fragmentation. By querying multiple repositories in parallel, the metasearch service eliminates the need for users to individually search through each source. Moreover, by analyzing query trends and identifying assets that are frequently searched, the service leverages caching to optimize how results are ranked, thus allowing users quicker access to the desired results.

What is a Metasearch Service?

A metasearch service amalgamates data from multiple search engines or databases and displays these results in a unified format. The XReco Project leverages this technology to integrate data from project partners with public repositories. This extensive integration offers users access to a broader and richer data array than any single source could provide, enhanced by AI-driven algorithms that re-rank the search results to improve accuracy and relevance.

The Role of Connectors in Data Integration

At the heart of the metasearch service are the connectors, which are crucial for the seamless integration of various types of media—images, videos, text, 3D models, and audio. The project currently employs five main connectors, each specifically designed to interact with distinct external APIs: RAI, DW , UNIBAS, Wikimedia and SketchFab. These connectors are uniquely configured via YAML files, which encapsulate all necessary operational logic, thereby optimizing functionality and facilitating future scalability.

The use of YAML files for configuring connectors not only streamlines their setup but also significantly enhances the adaptability of the system. This approach allows for the straightforward incorporation of additional connectors, making it easier for other companies and entities to join the XReco ecosystem.

Data Processing: After retrieving data, the connectors validate and standardize it according to a specially designed schema model. This model harmonizes metadata from different sources, resolving discrepancies in content access and structure and ensuring data accuracy and reliability.

This design also features robust error handling mechanisms, such as discarding URLs with typographical errors, ensuring that only accurate and reliable data is integrated into the metasearch service.

Metaseach Infrastructure

The technical backbone of the metasearch service is its robust microservices architecture, which includes several key components:

Data management system: Comprising an Elasticsearch cluster, Kibana, and a custom Metrics Collector that utilizes Filebeats, the Elastic Stack supports rapid data indexing and retrieval, facilitates data visualization, and efficiently collects and displays metrics from various sources.
Search API Container: Serves as the operational core of the metasearch service. It manages all communications with database connectors and handles interactions with Elasticsearch, overseeing the API and its public documentation.
UI: While the framework is designed to function seamlessly with the Orchestrator user interface, we have also developed a dedicated UI for the metasearch service, enabling it to operate effectively as a standalone solution.
Re-ranking agent: We integrated a sophisticated reranking system using OpenCLIP, an open-source implementation of OpenAI’s CLIP (Contrastive Language-Image Pre-training), alongside a Celery-Flower infrastructure. This setup allows for dynamic reranking of data using artificial intelligence, enhancing the relevance of search results. By utilizing OpenCLIP’s image classification capabilities, which have been trained on diverse and extensive datasets such as LAION-400M and LAION-2B, the system can effectively analyze and prioritize content based on visual and textual congruency. The Celery-Flower architecture supports this by managing the asynchronous task queues required for processing these AI-driven tasks efficiently, ensuring that reranking operations are both scalable and quick to respond to user queries.
Auth: The authentication system utilizes OAuth and OpenID Connect (OIDC), standards that manage access delegation and user identity verification. OAuth facilitates secure resource access by issuing tokens after user authentication, while OIDID adds an identity layer, enhancing security and ensuring user identities are verifiable across services. This robust framework ensures protected and efficient user authentication.
Connectors: Connector containers, along with their YAML configurations, can be flexibly deployed either on the core premises of the metasearch service or proximate to the target repository premises. This adaptable setup enhances the system’s efficiency by reducing latency and improving data synchronization between the metasearch engine and the external data sources.

Advanced Data Handling and Search Logic

Within the metasearch engine, Elasticsearch is employed to create four specific indices that store various types of data: user queries, raw responses from connectors, ranked responses, and content baskets with aggregated results. The service employs an advanced caching system to boost response speed and reduce hardware load, which is particularly beneficial given the high search volumes for popular terms.

Potential Challenges and Solutions

To date, the Eviden Metasearch service has successfully integrated all planned repositories and is now focusing on refining the search logic and connectors. Future plans include expanding the range of 3D repositories and employing AI to refine the ranking mechanism, aiming to improve result prioritization based on the relevance of non-text metadata.
As the metasearch service evolves, potential challenges such as scaling to accommodate an increasing number of users and data sources and ensuring the continuous reliability and accuracy of the connectors, will need to be addressed. Solutions may include further architectural enhancements, increased use of cloud technologies for scalability, and ongoing improvements to the AI-based ranking system.

Conclusion

The Eviden metasearch service represents a significant advancement in making complex datasets more accessible and usable. By integrating innovative technologies like OpenCLIP for AI-driven reranking and employing modern authentication standards such as OAuth and OpenID Connect, the service not only enhances security but also improves the relevancy and accuracy of search results. With the strategic use of connectors configured via YAML files, the system remains flexible, easily integrating new data sources and allowing for scalability. As this project continues to evolve, it aims to remain at the forefront of search technology, providing valuable tools for users across various disciplines and pioneering new ways to manage and retrieve digital information.

Exploring Seamless Data Integration with XReco’s Metasearch Service