Hi so I have a scenario where I have 200 samples. Each samples I test for 10000 genes. I have two events, call it A and B. Whereby for each sample there can be x genes with gene.up and gene.down. What I want to do is compare if events A and B are similar: I want to see if the intersection is significant. To visualize this I do a venn diagram and see if the intersection is significant. Normally I will do a fisher exact test or hypergeometric test. However its strange here because I have to account for sample, direction and gene. So its coded like this. Sample1.gene.up only this will be consider a match. My question is what then is the total population. For example, if total gene was 1000 is the total population then, 1000 * 2 * n samples. The two because gene can be up or down. Finally it would look something like this. I'm using R.
q = length ( intersect )
m= length( n1 )
k= length(n2)
n= 1000 * 2 * total.sample - m
phyper(q,m,n,k,lower.tail=F)
for a fisher test it would look something like this.
total.sample = 200
m =matrix ( c(
1000 * 2 * total.sample
, 400
, 500
, 700
)
,nrow=2)
fisher.test ( m , alternative = "greater")
I need advice if I'm doing this correctly? especially if the total population is is correctly calculated? thanks!