DAVID Functional Annotation Clustering Analysis
1
1
Entering edit mode
9.3 years ago
parksuhong ▴ 10

I am using DAVID's Functional Annotation Clustering analysis tool and I wonder how DAVID's algorithm test the null hypothesis that the enrichment of an annotation is purely by chance? Could anyone explain to me in simple way?

I am little confused because for example,

One of the annotation cluster has only 7 genes AND the enrichment score is 1.23 with p-values of 2.8E-2:

clustered terms are DNA-binding region:ETS (7 genes), Ets (7 genes), Domain:PNT(4 genes), ETS(7 genes), SAM PNT(4 genes)

But for another annotation cluster, there are 86 genes BUT the enrichment score is only 0.05 with p-values of 1.0E0:

Clustered terms are mitochondrial lumen (19 genes), mitochondrial matrix (19 genes), mitochondrion (86 genes), mitochondrial part (43 genes), mitochondrion (59 genes)

So higher number of overlapping genes in between each GOTERM doesn't necessarily means higher enrichment score and lower p-value? I am still confused to how the first annotation cluster above with only 7 genes overlap amongst GOTERMs has higher p-value than the second cluster where there are at least 19 genes overlapping amongst GOTERMs?

Thank you!

RNA-Seq ChIP-Seq sequencing • 5.7k views
ADD COMMENT
0
Entering edit mode

This values make sense to me: the higher the enrichment score the better and consequently, for higher enrichment scores you will receive lower p-Values. Because the p-Values specify the likelihood of receiving the corresponding enrichment score by chance.

The enrichment score depends on the fold-change (or intensity values) of you genes and not on the overlap. Thus is makes sense that you are able to gain a higher enrichment score with few genes. But it is hard to tell without knowing you data...

ADD REPLY
0
Entering edit mode
9.3 years ago
Alternative ▴ 290

This has most likely to do with your sample size (effect size). One should be very careful when small absolute number of genes is used in such analysis. roughly speaking, going from 1 to 2 is doubling by adding only 1. going from 10 to 20 is also doubling but by adding 10. This is not the same also you double in both. Maybe this will help: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/

ADD COMMENT

Login before adding your answer.

Traffic: 2784 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6