You are here : Home > Resources > CEA takes up the Big Data challenge

Clefs CEA | Article | Fundamental Research | Defence & security | Energies


Clefs CEA n°64 - Les voix de la recherche - Journey to the heart of Big Data

CEA takes up the Big Data challenge

CEA has a full role to play in the initiatives being taken at the national and European levels, to stimulate research and innovation in the field of Big Data.

Complete version of the article published in Clefs CEA n°64 - Journey to the heart of Big Data. 

Published on 13 June 2017

The growth in the quantity of data produced by companies, private individuals and scientists is exponential, as clearly illustrated by two figures:

  • 90% of all existing data was created less than two years ago. 
  • each week, the quantity of data generated is greater than that produced during the previous millennium [1].
Management of these data and how they are used have profound implications and represent major challenges both for society and for economics and science.
Science is therefore today undergoing an epistemological revolution with the implementation over barely a decade now of a “fourth paradigm” of scientific discovery [2], based on analysis and intensive exploitation of data, without any model describing the real world being in principle necessary beforehand. This revolution affects all scientific sectors, in particular the fields of biology-health and the human and social sciences.
The economy is also significantly affected, with the omnipresence and all-pervasiveness of data radically altering the picture in most value chains. The emergence of GAFA [3] is a direct manifestation of this, but the major industrial firms have now identified data management as a key factor in their competitiveness and are incorporating the digital transformation into their company processes.
Over and above the consequences of economic change and scientific progress as a result of data management, society in general is being modified by this phenomenon in various ways: development of user services, changes in educational methods, jobs and job-seeking practices, energy systems – against a backdrop of private data protection and public data access.

A general realisation

Widespread access to the internet in only fifteen years, the migration of economic and social activities to this same Internet, the development of the Internet of Things (IoT) have coincided with falling data production, processing and storage costs and an explosion in available computing power. This was followed by a general realisation that data analysis is an increasingly decisive factor in innovation and growth. 
For example, in 2012, the United States set up a Big Data Research and Development initiative and then, in 2016, a Big Data federal strategic plan [4]. The American government also offers widespread access to its data. Finally, it is worth noting that the 2015 Presidential order creating a national strategic computing initiative identified high performance computing (HPC) requirements as being on a par with high performance data analysis (HPDA) needs.
Europe also has a dynamic policy. The creation of a European single digital market is one of the main priorities (n°2) of President Juncker [5]. The corresponding announcements made over the past two years aim to provide Europe with the policies, infrastructures and regulatory frameworks it needs to be competitive. From the scientific viewpoint, one should in particular note the announcement of a European cloud initiative which brings together a European Data Infrastructure, including “exascale”[6], and a policy to ensure the openness and interoperability of scientific data (European Open Science Cloud) [7].

France’s tangible advantages

France has a number of tangible advantages in global competition: 
  • a top-tier school of mathematics;
  • an extreme computing technologies policy supported since 2002;
  • a national research network under the Allistene alliance, whose scientific output in 2012 placed it number 5 in the world rankings;
  • high-level infrastructures with Renater and Genci;
  • a high-tech industrial and services fabric, with major groups such as Alcatel-Lucent, Atos-Bull, Cap Gemini, Dassault Systèmes, Orange, OVH and numerous dynamic SMEs.
However, when the National Research Strategy (SNR) was created, the main weakness spotlighted was the lack of experts in data and knowledge extraction [8]. 

In addition to its participation in European actions, France has set up several initiatives directly or indirectly aiming to stimulate the field of research and innovation in massive data: 
  • definition of a priority research programme in the SNR and a Digital infrastructures and services modernisation plan by the le CODORNUM [9], coordinated by the Ministry for Higher Education and Research;
  • implementation of an Economics of data industrial solution within the Nouvelle France Industrielle coordinated by the Ministry of the Economy and Finance, which more specifically identifies expertise in exascale technologies;
  • “Digital Republic” Act adopted in 2016 which in particular authorises text and data mining for public research purposes and gives the national research community a framework placing it in a highly favourable position on the world stage.

The two pillars of CEA’s strategy

CEA plays a full part in this dynamic scientific process and supports national industry with a strategy built on two pillars: 
  • on the one hand, an integrated extreme computing policy applied to numerical simulation and massive data processing;
  • on the other, a range of digital transformation services for industry including data intelligence, sensors and IoT, advanced manufacturing and cybersecurity.
Extreme computing and modelling/simulation have traditionally driven digital technologies and usages, giving an increasingly important place to data (the architecture of the large computers and their entire infrastructure, as well as increasingly data-centric developments, are indicative of this). These tools, which had to be fully adopted and assimilated by CEA following the cessation of nuclear testing in 1996, are today essential for the performance of all of its duties. CEA’s integrated policy in this field ranges from hardware and software technologies to computing and data infrastructures and to applications in a “co-design” loop which enables the computer architectures and application codes to be jointly optimised. The State entrusted CEA with the national role of developing extreme computer technologies [10] and to this end it set up an R&D partnership with the ATOS/Bull company. Data management is at the heart of the strategy in several respects, owing to the growing volume of data to be processed: 
  • the actual architecture of the computers (high memory bandwidth / flops ratio, hybrid processors, general-purpose vs. GPU or other specialised acceleration units);
  • the organisation of computing infrastructures (disk storage of petabytes or tens of petabytes; tape archival for even greater volumes; local area network with several hundred Gbit/s throughput, etc.); 
  • software solutions developed (Lustre…) for administration of computers and infrastructure;
  • but also the architecture of application codes.
All of the corresponding developments would not be possible without a dense network of collaborations, whether national (Ter@tec, UVSQ, Inria, Genci, etc.), European (ETP4HPC, PRACE, large computing and research centres such as Forschung Zentrum Jülich in Germany, Barcelona Supercomputing Centre in Spain, etc.) or international (Riken in Japan, DOE in the United States).

Three priority challenges for the digital transformation of industry

It should be remembered that computers and large scientific instruments are not the only producers of large masses of data. Other sources have emerged, are becoming more widespread or are significantly increasing in volume: social networks, administration (public authorities) in the broadest sense, roaming and connected objects, etc. Miniaturised instrumentation and generalised communication technologies (wired or wireless) have opened up unprecedented prospects for the production and exploitation of data, which lie at the heart of the digital transformation of industry. As part of its industrial support role, CEA – primarily with the List – is developing solutions for this transformation, drawing on a strong tradition of signal processing and hardware/software coupling. In this field, three priority challenges have been identified:
  • the integration of heterogeneous and multivariate data; 
  • the availability of pertinent processing to a large extent focused on decision-making and often with “real time” constraints, which requires the constant development of new processing algorithms – data mining, machine learning, distributed intelligence, artificial intelligence;
  • the cost of the solution.

To address these challenges, CEA has organised its activities around seven main topics: 
  • raw data processing;
  • scene analysis;
  • distributed self-adapting systems;
  • data modelling and visualisation;
  • architectures similar to vision sensors;
  • neuromorphic architectures and solutions;
  • critical real-time design.
Here again, developments are being carried out through a large number of national, European and international academic and industrial partnerships. Digitec, the digital systems research centre created on the Paris-Saclay campus – bringing together CEA-List, Inria, Telecom-Paristech, Systematic, IRT SystemX and Paris-Saclay University – is emblematic of this dynamic collaboration.

The training challenge

Finally, training in the new professions required by these constantly and rapidly changing fields is an area to which CEA is paying particularly close attention. At the forefront is training through research, but also participation in the Masters degrees from Paris-Saclay University, training in cutting-edge programming techniques (Cuda Centre labels, Prace Advanced Training Centre at the Maison de la Simulation), or the creation of chairs, such as the ATOS-ENS Paris-Saclay-CEA Chair, more specifically intended for the training of “data scientists”.
To conclude, through the challenges to be met and the opportunities offered, both for understanding physical phenomena and as an engine for economic growth, Big Data is a fundamental area for CEA and one to which it is fully committed.


​REfErences

[1] OECD Data-driven Innovation for Growth and Well-being STI policy note, October 2015
[2] The Fourth paradigm, data intensive scientific discovery, Microsoft edition 2009 Published by Tony Hey, Stewart Tansley, and Kristin Tolle; the first paradigm is the observation of nature, the second is the development of theoretical models, the third is numerical simulation.
[3] Google, Amazon, Facebook, Apple
[6] Exaflops/s = 1018 computing operations/s
[8] Report on national research strategy 04/2014
[9] Comité d’Orientation du NUMérique de l’Enseignement Supérieur et de la Recherche (digital steering committee for higher education and research)
[10] JORF n° 0219 of 21 September 2014 and n° 0150 of 29 June 2016


contributors

Jean-Philippe Bourgoin

Jean-Philippe Bourgoin is the former Director of Strategic Analyses (DAS) at CEA.


Jean-Philippe Nomine

Jean-Philippe Nominé is Head of the Digital Project in the Strategic Analyses Division (DAS) at CEA.



OTHER articles OF Clefs


Other articles of Clefs

RSS feed