Table of Contents

The RAIDO Data Lake: Enabling reliable and sustainable Artificial Intelligence

Modern Artificial Intelligence relies on data. As AI solutions become more sophisticated and are applied across different domains, managing datasets efficiently, securely, and sustainably becomes significantly important. This is exactly the challenge addressed by the RAIDO Data Lake, which serves as the central information management layer of the platform.

The RAIDO Data Lake brings together datasets, AI models, monitoring information, workflow artefacts, and analytical outputs generated across all project pilots into a single, unified environment. Rather than maintaining separate storage systems and repositories for different applications, the Data Lake provides a common infrastructure that supports the entire AI lifecycle, from data ingestion and model training to evaluation, optimization, and deployment.

A key requirement of RAIDO is the ability to support highly diverse pilot environments. The project covers smart energy grids, smart farming and fungal food production, healthcare and pharmacogenomics, as well as plant fibre characterization and robotics. Each pilot produces different types of information, ranging from structured measurements and time-series data to medical reports, high-resolution images, simulation outputs, and AI models. The Data Lake has been designed to handle all these data sources while ensuring that information remains accessible, reusable, and securely managed throughout the project.

To achieve this, the Data Lake follows a multi-database approach, using the most appropriate (open source) technology for each type of data. MySQL is used for structured and tabular datasets, MongoDB manages images, documents, and semi-structured information, MinIO stores AI models and large binary artefacts, and InfluxDB supports monitoring and time-series data, including infrastructure (hardware) and energy-consumption metrics. Together, these technologies provide a flexible and scalable storage ecosystem capable of supporting the wide variety of data generated across the RAIDO pilots.

Beyond data storage, the Data Lake plays a crucial role in supporting RAIDO’s vision for trustworthy and green AI. The platform continuously monitors computational resources and energy consumption during AI workflows, enabling researchers to evaluate not only model performance but also their environmental impact. This capability supports evidence-based optimization and contributes to the development of more sustainable AI solutions.

The architecture has also been designed with scalability and interoperability in mind. As datasets grow and AI workflows become more complex, the infrastructure can evolve accordingly, ensuring long-term usability beyond the project’s initial deployment. At the same time, the Data Lake enables seamless collaboration between different RAIDO components, allowing AI services, monitoring tools, visualization modules, and optimization mechanisms to exchange information through a shared environment.

The implementation of the complete Data Lake infrastructure, including the backend services, APIs, deployment environment, and hosting infrastructure, has been led by UBITECH. Through the development of a cloud-native architecture and a comprehensive set of integration services, UBITECH has delivered the foundation that allows all RAIDO components to communicate, exchange data, and operate as a unified platform.

As RAIDO continues to advance reliable, explainable, and resource-efficient AI technologies, the Data Lake remains a cornerstone of the ecosystem. By providing a secure, scalable, and sustainable data foundation, it enables innovation across all pilots while helping transform ambitious AI research into practical, real-world solutions.

Find us on Social Media

More Insights

News

RAIDO’s 14th Newsletter Released

Explore the latest edition of the RAIDO Newsletter here and discover the latest project updates, technical developments, and partner activities. This issue highlights RAIDO’s work on TinyML and Green AI orchestration, the RAIDO Data Lake

Read More »
Newsletter

14th RAIDO Newsletter – June 2026

RAIDO JUNE 2026 NEWSLETTER [PART 1] Welcome to the first installment of our June update! This edition highlights some of the latest technological developments and partner contributions driving RAIDO’s vision for Trustworthy and Green AI.

Read More »
News

Dive into RAIDO with AXON LOGIC

The latest video on the RAIDO Project YouTube channel is now live, showcasing the technical contributions of AXON LOGIC to the development of trustworthy and transparent AI solutions. In this partner spotlight, AXON LOGIC presents

Read More »
News

Dive into RAIDO with Kingston University

The RAIDO Project YouTube channel continues its partner spotlight series with a new video featuring Kingston University (KU), the project’s Scientific Coordinator. In this short video, Kingston University presents its role in RAIDO and shares

Read More »
News

RAIDO’s 13th Newsletter Released

Explore the latest edition of the RAIDO Newsletter here and discover the latest project updates, technical developments, and dissemination activities. This issue highlights RAIDO’s participation in European Data Week 2026, recent progress on trustworthy and

Read More »