Why does GO enrichment result give different results when gene list cutoff change?
1
0
Entering edit mode
6.8 years ago
hellocita ▴ 40

I am new to GO annotation. I use DAVID to do GO annotation, which calculate the gene overrepresentation by fisher exact test. I have gene list with FDR cutoff, in my point of view, if I choosing FDR <= 10% gene for GO annotation, the matched GO terms should have some overlap with FDR <= 5% ones because the two lists have many gene overlap and the last one is with higher confidence, however, it's totally different, and I doubt the GO annotation result with FDR <= 10% gene is true?

How can the annotation be not robust with the given gene set changed? and is there any ways/paper/packages to permitted this? Thanks!

RNA-Seq gene • 2.7k views
ADD COMMENT
1
Entering edit mode

There should ideally be a good overlap between the two, but it is definitely not guaranteed. For example, how many genes have an FDR lower than 0.05 and how many lower than 0.1? It's possible that the latter set is a lot larger and therefore the gene overlap isn't big itself.

A common geneset enrichment tool that doesn't depend on a threshold is GSEA, but there are really a lot of algorithms available. You can pick the one that best suits your needs.

ADD REPLY
1
Entering edit mode

Thank you @Martombo, the gene number change should be the reason. On FDR 10%, I have 390 genes in list and 20 GO terms enriched(BH corrected fisher test p-value <0.05). However on FDR 5% , I have only 76 genes in list and no GO terms called significant, even if I relax the p-value to be higher(fisher test p-value < 0.1) to have some GO terms enriched, still no overlaps with the first list and even looks totally different. I should figure out other ways to intepret the gene list. And thanks for your suggestion!

ADD REPLY
2
Entering edit mode
6.8 years ago
theobroma22 ★ 1.2k

They are both true. You have to consider the math behind the enrichment giving you the result. If you change the number of genes you change the result because the total number of genes for GO category X is a factor determining the significance of those genes and that category. Ninety nine percent of the time, if you change the input the output changes.

ADD COMMENT
0
Entering edit mode

thank you @theobroma22, but how can I trust the enrichment result if they will be changed when FDR of gene list is changed?

ADD REPLY

Login before adding your answer.

Traffic: 2083 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6