Question

Should I use the p-value or q-value for pathway analysis in DAVID?

0

Entering edit mode

9.8 years ago

cpm186 ▴ 10

I am analyzing some RNAseq data using DAVID (http://david.abcc.ncifcrf.gov/home.jsp). I want to look and see if any signaling pathways are over-represented in my dataset. After inputting the genes, I am given both a p and a q value (Benjamini). Is it generally acceptable to use a p value of 0.05 as a cutoff, or should the q value be used?

RNA-Seq DAVID Statistics • 15k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 9.8 years ago by cpm186 ▴ 10

score 2 · Answer 1 · 2015-07-08

2

Entering edit mode

9.8 years ago

5utr ▴ 370

Since you are doing a multiple testing you should use a multiple testing correction as the q value. This is to prevent using a p-value that pass your threshold that was obtained randomly, because you did a large number of tests.

ADD COMMENT • link 9.8 years ago by 5utr ▴ 370

1

Entering edit mode

Suppose you accept a given case as "significant." You also accept everything with a lower p-value. The q-value of the case you accepted is the false discovery rate you should expect, i.e. E[ false positives / all positives].

ADD REPLY • link 9.7 years ago by eric.kern13 ▴ 240

0

Entering edit mode

Ok, thanks. By multiple testing, do you mean that each gene has the possibility of being in more than one category?

ADD REPLY • link 9.8 years ago by cpm186 ▴ 10

1

Entering edit mode

No multiple testing means that you are taking your list of genes and doing an enrichment test for every single pathway that is annotated in DAVID. The higher the number of tests (the number of pathways considered in this case) the higher the chance that you'll find significant p-values by random. This is why there are multiple testing corrections that need to be calculated when performing this type of testing. Benjamini is just one of them.

ADD REPLY • link 9.8 years ago by 5utr ▴ 370

Ram · Answer 2 · 2015-07-08

I think Gian is trying to be correct, but his answer is a little misguided. I agree that you should use the q-value. However it is because of the distribution of the measurement of the genes and NOT because of the number of pathways. But, after you do any kind of enrichment analysis you should also correct for multiple comparisons by using a correction factor like FDR or FWER (Bonferroni). If analyzing an ontology with a hierarchical relationships, these correction factor (FDR/FWER) will be skewed because genes in a child ontology term will also be annotated to its parent and thus are not true independent measurements. A correction factor like Elim or Weight (Alexa et al 2006) take these dependencies into account when performing these correction.

Also, please note that DAVID has not updated their knowledge base since September 2009. Meanwhile pathways and annotated genes have changed considerably since then.

You might consider using an application like iPathwayGuide (www.iPathwayGuide.com) as an alternative.