I am analyzing some RNAseq data using DAVID (http://david.abcc.ncifcrf.gov/home.jsp). I want to look and see if any signaling pathways are over-represented in my dataset. After inputting the genes, I am given both a p and a q value (Benjamini). Is it generally acceptable to use a p value of 0.05 as a cutoff, or should the q value be used?
Since you are doing a multiple testing you should use a multiple testing correction as the q value. This is to prevent using a p-value that pass your threshold that was obtained randomly, because you did a large number of tests.
Suppose you accept a given case as "significant." You also accept everything with a lower p-value. The q-value of the case you accepted is the false discovery rate you should expect, i.e. E[ false positives / all positives].
No multiple testing means that you are taking your list of genes and doing an enrichment test for every single pathway that is annotated in DAVID. The higher the number of tests (the number of pathways considered in this case) the higher the chance that you'll find significant p-values by random. This is why there are multiple testing corrections that need to be calculated when performing this type of testing. Benjamini is just one of them.
I think Gian is trying to be correct, but his answer is a little misguided. I agree that you should use the q-value. However it is because of the distribution of the measurement of the genes and NOT because of the number of pathways. But, after you do any kind of enrichment analysis you should also correct for multiple comparisons by using a correction factor like FDR or FWER (Bonferroni). If analyzing an ontology with a hierarchical relationships, these correction factor (FDR/FWER) will be skewed because genes in a child ontology term will also be annotated to its parent and thus are not true independent measurements. A correction factor like Elim or Weight (Alexa et al 2006) take these dependencies into account when performing these correction.
Also, please note that DAVID has not updated their knowledge base since September 2009. Meanwhile pathways and annotated genes have changed considerably since then.
You might consider using an application like iPathwayGuide (www.iPathwayGuide.com) as an alternative.
ADD COMMENT
• link
updated 24 months ago by
Ram
44k
•
written 9.4 years ago by
andrew
▴
560
Suppose you accept a given case as "significant." You also accept everything with a lower p-value. The q-value of the case you accepted is the false discovery rate you should expect, i.e. E[ false positives / all positives].
Ok, thanks. By multiple testing, do you mean that each gene has the possibility of being in more than one category?
No multiple testing means that you are taking your list of genes and doing an enrichment test for every single pathway that is annotated in DAVID. The higher the number of tests (the number of pathways considered in this case) the higher the chance that you'll find significant p-values by random. This is why there are multiple testing corrections that need to be calculated when performing this type of testing. Benjamini is just one of them.