Entering edit mode
3.2 years ago
foxiw
▴
10
Hi, I'm trying to use the clusterProfiler package to explore which pathways are overrepresented in my list of significantly differentiated genes. I have 173 DE genes, however when I run them through enrichGO it produces an output that has no pathways in it. I've ran the same parameters on other DE genes from other comparisons (one with only 43 DE genes) and they both produced outputs with pathways, so I don't understand why a list of 173 genes doesn't produce anything. My enrichGO code is below:
ego3 <- enrichGO(gene = sig_genes3,
universe = gene_list3,
keyType = "ENSEMBL",
OrgDb = org.Mm.eg.db,
ont = "BP",
pAdjustMethod = "BH",
readable = TRUE)
Output:
[1] ID Description GeneRatio BgRatio pvalue p.adjust qvalue geneID Count
<0 rows> (or 0-length row.names)
Any help would be much appreciated.
It might be the case there are no enriched pathways by the hypergeometric test. You can check by returning the p-values for all tested ontologies by adding the
pvalueCutoff=1
andqvalueCutoff=1
arguments to the function.Hi, I've tried doing as you said, and that does produce an output with enriched pathways. I'm not sure what these cutoff values mean though? I'm guessing that a pvalueCutoff is the cutoff for the probability of a specific pathway being enriched, whereas I don't know what qvalueCutoff means. Is it bad practice to report enriched pathways that have a highly pvalue of 0.05?
These types of analysis are for hypothesis generation. I personally often do not care too much about pvalues in this context because if I find something interesting then it is to be validated in the lab, and then the finding is either true or not, regardless of the pvalues these tools produce. If there is something p > 0.05 and you have additional evidence that this pathway might indeed be doing something in your setup then go forward and try to validate it. Pathway enrichment is just stats, it does not proof anything, it builds a hypothesis for validation.
Thank you for answer. I feel more confident going forward with my results now.
Just to clarify - the pvalueCutoff is the cutoff value for the probability that a pathway is enriched in the dataset - correct?
The p-value is derived from the hypergeometric test, which is testing whether the overlap between your DEGs and the set of genes in the GO ontology term is larger than the overlap expected if you just randomly sampled genes. Also relevant to you is that you are testing the overlap between your DEGs and many ontology terms, so you need to look at the q-value which corrects for the multiple comparison problem (increased false positive rate as your increase the number of comparisons). The higher the q-value the less likely you are able to distinguish between the null hypothesis (random sampling) and your alternative hypothesis (actual enrichment of that term) thus increasing your false positive rate. With high q-values for all the terms I would conclude that GO ontology analysis for your dataset isn't a productive analysis.
I just said that pvalues (or rather the FDR-corrected ones) are not the center of the world. If you have very large pvalues and still accept those then it is unlikely that the results are meaningful. All I wanted to say is that if something promising is at FDR = 0.15 it might still be worth looking at. If you only have FDR=1 then it is unlikely this analysis is fruitful.