Restricting false discoveries in proteomics and omics biology with rigorous and flexible frameworks

Researchers at IRIG are adapting high-dimensional statistic theories to improve biomarker candidate selection in proteomics and omics biology

Published on 14 May 2024

Technological improvements for large-scale molecular characterization of biological samples is a double-edged sword. On the one hand, this reliable and rapid access to thousands of genes, transcripts, proteins or metabolites enable the verification of a considerable number of hypotheses about living organisms. On the other hand, the manifold of hypotheses studied simultaneously increases the risk that one of them is incorrectly validated by chance (a so-called “false discovery”). This increase roots in combinatorics: the probability is low that a random molecule displays measurement fluctuations that match to the expectations induced by the hypothesis studied. However, if several thousand of them are considered simultaneously, the probability that at least one of them behaves accordingly becomes significant.

To control for the risk of false discoveries, advanced statistical methods are needed as experimental designs become more and more elaborate. This is particularly so in proteomics, where the complexity of the instrumental set-up (mass spectrometry and liquid chromatography coupling) adds to the small number of samples that it is generally possible to analyze. For years, Irig researchers have therefore been working on articulating the experimental constraints and theoretical hypotheses necessary to control for false discoveries, in order to propose data analysis workflows with rigorous quality control properties (e.g. www.prostar-proteomics.org). Their recent work has focused on the theory of Knockoffs Filters, which has revolutionized the field of selective inference by proposing to leverage random draws to better characterize the properties of false discoveries. In particular, they made the link between these filters and the empirical methods for controlling for false discoveries that have historically been used by proteomic researchers, which makes it possible to propose innovative methods [1, 2].

Figure: A typical “volcano-plot”, representing the proteins analyzed by orange dots, and which can explain a difference in phenotype (for example healthy or diseased), depending on their significance (on the Y-axis) and the importance of the effect measured (on the X-axis). The most relevant candidate biomarkers are usually located near the top two corners, but some may be located lower in the middle, hereby complicating selection. Knockoff filters make it possible to control for the false discovery rate associated with a selection of proteins (in green) following a more flexible decision boundary, notably hyperbolic (represented here in blue), which allows taking into account both the effect and the significance.

ANR Fundings

Multidisciplinary Institute in Artificial Intelligence MIAI @ Grenoble Alpes
Programme GRAL via Chemistry Biology Health Graduate School at University Grenoble Alpes
ProFI Proteomics French Infrastructure

Proteomics: large-scale characterization (identification and quantification) of proteins present in a biological sample.
Selective inference: a field of high-dimensional statistics, which deals with the generalization of knowledge drawn from experimental data where the data have been previously selected for their specific characteristics.

[1] Burger T.
Fudging the volcano-plot without dredging the data
Nature Communications 2024

[2] L. Etourneau L and Burger T.
Challenging targets or describing mismatches? A comment on Common Decoy Distribution by Madej et al.
Journal of Proteome Research 2022

Top page

Keywords : proteomics

Alternative and Atomic Energies Agency

CEA is a French government-funded technological research organisation in four main areas: low-carbon energies, defense and security, information technologies and health technologies. A prominent player in the European Research Area, it is involved in setting up collaborative projects with many partners around the world.

Top page

Interdisciplinary Research Institute of Grenoble (IRIG)

In the same section :

Restricting false discoveries in proteomics and omics biology with rigorous and flexible frameworks

References

Keywords : proteomics

Proteomics

Browse the site

Alternative and Atomic Energies Agency

Browse the portal

Interdisciplinary Research Institute of Grenoble (IRIG)

Interdisciplinary Research Institute of Grenoble

Departments of the institute

Laboratories at the Institute

Platform and technical facilities at our Institute

Scientific results of the laboratories

Scientific production of the institute

In the same section :

Restricting false discoveries in proteomics and omics biology with rigorous and flexible frameworks

References

Keywords : proteomics

Proteomics

Browse the site

Alternative and Atomic Energies Agency

Browse the portal