I have a list of genes that are up or downregulated after treatment, and another list of genes that are bound by a transcription factor. I want to know if the percentage of genes in the list of up/downregulated genes that also appears in the bound gene list is significant. I'm told that a Fisher's exact test is appropriate here, but I'm not sure how to do this.
So, I have 132 genes that are upregulated, 1557 genes bound. Of those 132 upregulated genes, 24 appear on both lists. I guess another way of asking this, is, how much of that matching would appear by chance?
I guess this is really a basic statistics question, so I'd love an answer and explanation that didn't include a lot of code. I want to know what to do and why that's appropriate.
How many genes are downregulated? Do you want to compare if there is an enrichment of TF bound genes in Upregulated gene set versus Downregulated gene set?
I'd like to split it up, so what are that chances that that those 24 upregulated genes happen to appear on the bound list, and 11 downregulated genes (there are 109 total downregulated genes) happen appear on bound list.
So if you want to do the test separately for Upregulated and Downregulated Genes, this is how you can do for Upregulated genes:
Upreg.Genes = 132
Upreg.TF.Bound.Genes = 24
Upreg.TF.UnBound.Genes = 108
Total.DiffExpr.Genes = 241 (132+109)
Total.DiffExpr.TF.Bound.Genes = 35 (24+11)
Total.DiffExpr.TF.UnBound.Genes = 206
mat = matrix(c(Upreg.TF.Bound.Genes, Total.DiffExpr.TF.Bound.Genes, Upreg.TF.UnBound.Genes, Total.DiffExpr.TF.UnBound.Genes),nrow = 2,dimnames =list(c("Upreg.Genes", "Total.DiffExpr.Genes"),c("TF.Bound", "TF.Unbound")))
# this is how your matrix will look like
TF.Bound TF.Unbound
Upreg.Genes 24 108
Total.DiffExpr.Genes 35 206
ftest = fisher.test(mat, alternative = "greater") #to check if TF bound genes are enriched in the upregulated set as compared to the entire diff. expressed gene set
pvalue = ftest$p.value # pvalue. If less than 0.05 then TF bound genes are significantly enriched in your upregulated gene set, else insignificant
estimate = ftest$estimate # estimate/odds ratio
Similarly, you can do the test for Downregulated genes.
You might be looking for a hypergeometric distribution test. This will tell you whether the overlap between up-regulated gene and TF bound genes is significant, or whether it is what you would expect at random.
How many genes are downregulated? Do you want to compare if there is an enrichment of TF bound genes in Upregulated gene set versus Downregulated gene set?
I'd like to split it up, so what are that chances that that those 24 upregulated genes happen to appear on the bound list, and 11 downregulated genes (there are 109 total downregulated genes) happen appear on bound list.