Hello,
I have compared 1370 genes which are obtained from chip seq analysis and 652 number of genes which are differentially regulated genes obtained by analyzing affymetrix 430.2.0 mouse array. When i intersect list both these lists of genes i got 37 genes common.
Now i would like to calculate what is the significance of this over lap. I was thinking to use phyper in R but this requires total number of genes here i am confused which number to give . Should i give total number of probes from affymetrix chip or should i give whole mouse genome number from chip seq data..
One more question can anybody suggest some ways how exactly to perfom this significance test in R.
Thanks,
Sai
Sorry for the big letters... I don't know why they look like that...
I think that whatever the results of a hypergeometric test might be in this case, a lot of caution should be advised in interpreting that result.
it is well-known that Chip-Seq, RNA-seq and other NGS based assays of the epigenome and transcriptome measure a lot of correlated outcomes (e.g. shared epigenetic programs affecting many genes).
It is very conceivable that in an example like the one you describe, it could be a much smaller number of factors that lead you to detect 37 genes in the overlap bin. If this is the case, using a test like the hypergeometric would be anti-conservative, and a lot of readers and reviewers might distrust the result for that reason.