Question

What to choose as background genes in GO enrichment analysis

0

Entering edit mode

5.2 years ago

tianshenbio ▴ 190

I tried both clusterprofiler and goseq for GO enrichment analysis. However, I got 4-5 times more GO terms enriched in goseq than in clusterprofiler since p-values calculated in goseq are generally smaller. This is because I used all genes in the genome as background genes so the background ratio is small and p is also smaller. Then what genes I should use as background genes? here are some of my options:

All genes in the genome
All genes with GO terms in the genome
Genes detected in my samples
Genes detected in my samples with GO terms

RNA-Seq go enrichment goseq clusterprofiler • 5.1k views

ADD COMMENT • link updated 9 days ago by caroline.zanchi • 0 • written 5.2 years ago by tianshenbio ▴ 190

2

Entering edit mode

I personally use the genes that were actually used in the DE analysis, so in my case all genes that survive the FilerByExpr step of edgeR.

ADD REPLY • link 5.2 years ago by ATpoint 89k

1

Entering edit mode

I would use 3 or 4 depending on your input list, if it contains only genes with GO term then 4, if not (which is the correct way IMHO) then 3.

ADD REPLY • link 5.2 years ago by Asaf 10k

0

Entering edit mode

I have the exact opposite : more significant terms with clusterprofiler than goseq.

ADD REPLY • link 9 days ago by caroline.zanchi • 0

score 3 · Accepted Answer · 2020-05-21

3

Entering edit mode

5.2 years ago

Papyrus ★ 3.1k

I would use all of the genes that were analyzed in your experiment: these are those on which you performed the differential expression analyses (you maybe previously filtered them to remove low-expression genes, etc.), because (under assumption of independence) those are the ones which had a chance of appearing as DEGs.

Regarding filtering out genes which do not map to GO terms, you can control for this at the GO enrichment step: the goseq function can control this with its use_genes_without_cat argument. And by default (most recent version) these genes are ignored in the enrichment testing.