You are here : Home > News > Big pangenomics to study microbial genomic diversity

Scientific result | Genomics

Big pangenomics to study microbial genomic diversity


In 2020, Researchers from LABGeM (Genoscope/CEA-Jacob), LaMME and the Pasteur Institute joined forces to develop a novel method for the analysis of prokaryote pangenomes, called PPanGGoLiN. Later that same year, as a complement to PPanGGoLiN, the team developed a second tool, panRGB, which brought unprecedented performance and speed to massive comparative genomics analyses.

Published on 7 February 2021

Microbiological comparative genomics may be deployed in functional, evolutionary or environmental studies to analyze the genetic content of one or several species of interest. The concept of pangenomics was introduced in the early 2000s with the aim of compiling the entire range of genomic diversity within a species. A pangenome has a "core" genome, comprising the genes present in all of the strains of the species, and a "variable" or "accessory" genome, comprising the genes not present in all of the strains.  The variable genome, which represents 5% to 40% of the gene content, has great importance because it provides the species with necessary capacities, for example environmental adaptation or disease resistance.

With the current deluge of new genomics sequences, novel bioinformatics tools are needed to rise to the challenges of analyzing the "Big Data" it creates. It is within this setting that researchers from LABGeM (Genoscope) teamed with colleagues from LaMME (CNRS/University of Évry-Val-d'Essonne mixed research unit) and the Pasteur Institute to develop PPanGGOLiN (Gautreau et al. 2020) a new method for the analysis of the several thousands of prokaryote pangenomes currently available. An originality of PPanGGOLiN is its use of a graph-based approach that depicts not only the entirety of gene families observed in a species but also all genomic colocalization information. Also, PPanGGOLiN comprises an automated graph-structure-based learning method that classifies pangenomes into three subclasses, i.e., "persistent" (gene families present in the majority of genomes), "shell" (present in an intermediate number of genomes) and "cloud" (present in a low number of genomes) genomes. That model proved more efficacious than methods based on gene family frequency thresholds, and furthermore permitted work on lower quality data such as those derived from metagenome¹-assembled genomes (MAGs).

Building upon their pangenome graphs, the LABGeM team then went on to develop a second method called panRGP to predict regions of genome plasticity (RGPs) and their insertion sites or "spots" (Bazin et al. 2020). RGPs usually arise from horizontal gene transfer² and correspond to genomic islands (GIs). Compared to other available GI detection tools, PPanGGOLiN appeared to be the best performing and most rapid method for massive comparative genomics studies.

0208_PPanGGoLiN_2.jpg














Example of a pangenome graph constructed from 3,117 Acinetobacter baumannii genomes.  The orange, green and blue nodes correspond respectively to persistent, shell and cloud gene families. The upper-left inset provides a close view of the region involved in the biosynthesis of the bacterium's major polysaccharide antigen.  (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007732)  


PPanGGOLiN and panRGP are both freely available in the PPanGGOLiN application suite. With this latter, researchers can create and manipulate prokaryote pangenomes from furnished genomic sequences or annotations.  These tools are also integrated within the MicroScope platform, with a webpage dedicated to results analyses and explorations.

1 : Metagenomics describes the study of the genetic content of samples collected directly from complex environments (e.g., the intestine, the ocean, soils, etc.) as opposed to samples cultivated in laboratories. Via the direct sequencing of the DNA present in the sample, this approach is able to provide not only a description of the genomic content of the sample but also a glimpse into the functional potential of the environment.

2 : A process by which organisms exchange genetic material independent of lineal descent. Horizontal gene transfer is thus distinguished from vertical gene transfer, as occurs between a parent and offspring.

Top page