Question

Why does GO enrichment result give different results when gene list cutoff change?

0

Entering edit mode

6.8 years ago

hellocita ▴ 40

I am new to GO annotation. I use DAVID to do GO annotation, which calculate the gene overrepresentation by fisher exact test. I have gene list with FDR cutoff, in my point of view, if I choosing FDR <= 10% gene for GO annotation, the matched GO terms should have some overlap with FDR <= 5% ones because the two lists have many gene overlap and the last one is with higher confidence, however, it's totally different, and I doubt the GO annotation result with FDR <= 10% gene is true?

How can the annotation be not robust with the given gene set changed? and is there any ways/paper/packages to permitted this? Thanks!

RNA-Seq gene • 2.7k views

ADD COMMENT • link updated 6.8 years ago by theobroma22 ★ 1.2k • written 6.8 years ago by hellocita ▴ 40

1

Entering edit mode

There should ideally be a good overlap between the two, but it is definitely not guaranteed. For example, how many genes have an FDR lower than 0.05 and how many lower than 0.1? It's possible that the latter set is a lot larger and therefore the gene overlap isn't big itself.

A common geneset enrichment tool that doesn't depend on a threshold is GSEA, but there are really a lot of algorithms available. You can pick the one that best suits your needs.

ADD REPLY • link 6.8 years ago by Martombo ★ 3.1k

1

Entering edit mode

Thank you @Martombo, the gene number change should be the reason. On FDR 10%, I have 390 genes in list and 20 GO terms enriched(BH corrected fisher test p-value <0.05). However on FDR 5% , I have only 76 genes in list and no GO terms called significant, even if I relax the p-value to be higher(fisher test p-value < 0.1) to have some GO terms enriched, still no overlaps with the first list and even looks totally different. I should figure out other ways to intepret the gene list. And thanks for your suggestion!

ADD REPLY • link 6.8 years ago by hellocita ▴ 40

score 2 · Answer 1 · 2018-01-30

2

Entering edit mode

6.8 years ago

theobroma22 ★ 1.2k

They are both true. You have to consider the math behind the enrichment giving you the result. If you change the number of genes you change the result because the total number of genes for GO category X is a factor determining the significance of those genes and that category. Ninety nine percent of the time, if you change the input the output changes.

ADD COMMENT • link 6.8 years ago by theobroma22 ★ 1.2k

0

Entering edit mode

thank you @theobroma22, but how can I trust the enrichment result if they will be changed when FDR of gene list is changed?

ADD REPLY • link 6.8 years ago by hellocita ▴ 40