With the recent worldwide data deluge, being able to analyse and integrate data management into computational simulations in order to derive scientific and innovative insights is becoming a critical requirement. The SAGE storage system aims to do exactly that, by developing a computing tool capable of efficiently storing and retrieving immense volumes of data at Extreme scales, with the added functionality of “Percipience”, or the ability to accept and perform user-defined computations within the storage system.
From Mero to SAGE - R&D
The SAGE system is built around the
Mero object storage software platform and its supporting ecosystem of tools and techniques, which will work together to provide the functionalities and scaling required for Extreme scale workflows. The SAGE system will seamlessly integrate a new generation of storage device technologies, including non-volatile memories. The SAGE system also offers a very flexible
API and a powerful software framework supporting easy expansion by third parties, meaning that adding new functionality via software will remain a simple process.
To drive the technological development, SAGE members used a co-design process based on use cases.
Figure: Needed workflow at extreme scales. © SAGE
SAGE architecture The SAGE system is built on multi-tier storage device hardware technology. Although SAGE is storage technology-agnostic, the system would typically include at least one
NVRAM tier (Intel 3DxPoint technology [3DxPoint] is a strong contender now), one or more flash tiers and at least one disk tier. Together, these tiers are housed in standard form-factor enclosures and provide their own compute capability. Moving up the system stack, compute capability increases for faster, lower-latency devices.
Mero, the object storage software, is layered on top of this hardware stack, providing fundamental management of object I/O and storage across tiers. In essence, Mero forms the core of the SAGE system.
Mero is presented to users via the Clovis API. All components above Clovis form the SAGE ecosystem.
The SAGE prototype system, available since the end of 2017, is one of the very first storage systems with features designed using ground-up requirement gathering and co-design that specifically addresses the overlap of Extreme scale computing and Big Data. SAGE has the potential to become the gold standard for effective Exascale storage platforms, even for applications that perform most of their processing in the compute nodes; an important consideration as I/O rate demands will continue to increase. Stakeholders will need to rethink the approach based on existing parallel file systems, which were designed for sub-Petascale environments, and current state-of-the-art storage solutions, which will not be able to exploit the continuing evolution of storage device technologies.
Example of an application for SAGE. The ITER tokamak is shown below – the outer surface of the plasma is here rendered pink and a standard human man is shown standing within the cryostat bottom left. © SAGE & ITER
Storing SAGE results in SAGE 2
The SAGE2 project, selected in January 2018 by the EC, aims to validate a next-generation storage system, building on the existing SAGE platform. It will address new use case requirements in the areas of extreme scale computing scientific workflows and AI/deep learning, thus leveraging the latest developments in storage infrastructure software and storage technology ecosystems. SAGE2 aims to provide significantly enhanced scientific throughput, improved scalability, and time & energy to solution for the use cases at scale.
About SAGE
SAGE is a Horizon 2020 research project, with 10 Partners at the state of the art from 4 European countries, led by Seagate. From September 2015 to September 2018, this project has received € 7.9 million funding from the European Union.