While DNA viruses are abundant, diverse and play a major role in ecosystems, RNA viruses have been seldom studied outside of pathological contexts.
To learn more about them, researchers from the Genoscope and their partners analyzed genomic sequences from 35,000 water samples taken from the world's oceans by the Tara Ocean consortium.
They extracted sequences of genes expressed in floating organisms and systematically analyzed RNA sequences containing the RdRp gene, which is present exclusively in RNA viruses. As the existence of RdRp dates back to the known origins of life on Earth, its sequence has evolved and its position has diverged repeatedly over time.
To account for this multitude of modifications, the team used machine learning to train an artificial intelligence with known phylogenetic trees. They then successfully tested its analysis capability with the classification of previously identified RNA virus sequences.
This tool allowed the researchers to deduce, from the organization of 44,000 new sequences, some 5,500 previously unknown virus species. Only some of the species belong to the five phylogenetic branches (or phyla) of the kingdom Orthornavirae, which mainly includes pathogenic RNA viruses. To classify the remaining new species, the biologists had to come up with at least 5 additional phyla and 11 new classes, which they will submit to the International Committee on Taxonomy of Viruses for formalization.
Their work shows that most of the newly identified species are concentrated in two of the new phyla, including Taraviricota, named in homage to the Tara Ocean consortium. Furthermore, these species are present in all the oceans, especially in the Arctic Ocean, where global warming is the most pronounced.
Today, the oceans absorb half of all atmospheric CO2, but what does the future hold? Studies suggest that marine viruses may play a role in priming the ocean's biological carbon pump.
More broadly, the fundamental knowledge acquired on marine RNA viruses is essential to advancing ecological, climatic and epidemiological models.
This work has been supported by the Tara Ocean Foundation, the CNRS, the European Molecular Biology Laboratory (EMBL) and the CEA-Genoscope.