question about TF enrichment analysis
1
0
Entering edit mode
6.0 years ago
tujuchuanli ▴ 130

There, I want to perform transcriptional factor binding analysis. My goal to find the over-represented and under-represented TF in one gene set over another gene set.

I have two gene sets (setA and setB). One contains 1000 genes and another contains 5000 genes.

  1. I cut the promoter region of each gene in these two gene sets.

  2. I record the number of genes whose promoter region are bind by TF A in each gene set (for example 800 for setA and 300 for setB). I also record the number of genes whose promoter region are not bind by TF A (for example 200 for setA and 4700 for setB). I downloaded the whole TF binding profile from JASPAS database. There are over 500 TF binding profile, I just take TF A as an example here.

  3. Now I have four numbers and perform chisq test using chisq.test function in R and get the P value.

The first question is whether the above is ok nor not?

For some reasons the length of promoter region for each genes in setA and setB cannot guarantee to be the same. Although the average length from these two gene sets is quite proximate. I think I should adjust it. Because longer promoter region should have higher binding. The second question is how I adjust it?

TF enrichment analysis • 1.5k views
ADD COMMENT
1
Entering edit mode

The Chi Square test seems a reasonable choice, given the data that you have accumulated. If you want to adjust it for length of promoter, why not build a regression model (somehow) and include the length as a covariate. With the model, you then extract the ANOVA Chi square p value from this:

anova(model,test="Chisq")
ADD REPLY
0
Entering edit mode
5.8 years ago
liux.bio ▴ 360

For TFs enrichment analysis, you can try homer

ADD COMMENT

Login before adding your answer.

Traffic: 2412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6