Question

In Gene ontology and KEGG analysis, What is the biological definition of p-value and (-) log10(FDR) value?

2

Entering edit mode

4.9 years ago

WUSCHEL ▴ 860

In Gene ontology and KEGG analysis, Why there are two cutoff values; p-value and (-) log10(FDR) value?

What is the role of these two cut-offs?

Could bioinformatician give an explanation to a wet lab person?

RNA-Seq genome next-gen gene • 4.4k views

ADD COMMENT • link updated 4.9 years ago by jared.andrews07 ★ 19k • written 4.9 years ago by WUSCHEL ▴ 860

score 4 · Accepted Answer · 2020-07-01

I'm going to link to another of my previous answers that explains the logic behind enrichment analyses. The reason that both p-value and false discovery rates (FDR) are used is to account for statistical issues arising from multiple testing. In short, p-values are computed from a single score, which can be misleading due to the high number of tests actually performed.

Multiple test correction methods, FDR among them, try to correct for this issue to give more accurate values that can be compared to whatever threshold (or alpha) that you choose - typically 0.05 or 0.01. The chosen threshold is arbitrary, but those values are generally regarded as stringent enough to be relatively believable. A score with a p-value or q-value or FDR of 0.05 calculated would occur in 5% of cases by random chance in the data. Using lower alpha values should reduce the false positive rate with the drawback of increasing the false negative rate. This short paper has a simple example that is fairly easy to follow along with additional context.