I tried both clusterprofiler and goseq for GO enrichment analysis. However, I got 4-5 times more GO terms enriched in goseq than in clusterprofiler since p-values calculated in goseq are generally smaller. This is because I used all genes in the genome as background genes so the background ratio is small and p is also smaller. Then what genes I should use as background genes? here are some of my options:
- All genes in the genome
- All genes with GO terms in the genome
- Genes detected in my samples
- Genes detected in my samples with GO terms
I personally use the genes that were actually used in the DE analysis, so in my case all genes that survive the
FilerByExpr
step of edgeR.I would use 3 or 4 depending on your input list, if it contains only genes with GO term then 4, if not (which is the correct way IMHO) then 3.