Question

Statistical test for overlap

0

Entering edit mode

4.9 years ago

pixie@bioinfo ★ 1.5k

Hello, I have a venn diagram with a list of up and down-regulated genes from experiment 1. I compared another gene list (from experiment 2) and got the overlap (108 genes are Up, 309 genes are Down and 41 genes are not deferentially expressed).

My NULL hypothesis is that the proportion of overlaps with up and down-regulated genes is 50-50 (that is random). I want to show that the proportion of down-regulated genes (as shown in the pie chart) is significant. What kind of statistical test should I do here ? Thanks.

enter image description here

statistics • 4.8k views

ADD COMMENT • link updated 4.9 years ago by H.Hasani ▴ 990 • written 4.9 years ago by pixie@bioinfo ★ 1.5k

1

Entering edit mode

4.9 years ago

Asaf 10k

I think you can ask better questions like how is the distribution of LFC of genes in the group compared to genes outside the group, try plotting the LFC distribution in violin plot for instance of the two groups (in Exp2 and not in Exp2) or MA plot but color according to Exp2 or not, it will present much more than you chose to present and test (by the way, neither Euler graph nor pie chart are good choices for presenting data, there are better alternatives).

ADD COMMENT • link 4.9 years ago by Asaf 10k

score 1 · Accepted Answer · 2020-06-15

1

Entering edit mode

4.9 years ago

e.rempel ★ 1.1k

Hi,

if I understood your question correctly, I would use the binomial test. Let me explain.

There are 417 genes in the overlap between Exp2 and Exp1 (Up and Down combined). Your NULL suggests that these genes are distributed between Up and Down with probability 0.5 for each subset (as you said, 50 - 50). That means in the lingo of the binomial test, you have 419 number of trials, 309 number of successes and probability of success equals 0.5. Thus, the way to compute binomial test in R would be

binom.test(x = 309, n = 417, p = 0.5)

I obtained p-value less than 2.2e-16.

HTH

ADD COMMENT • link 4.9 years ago by e.rempel ★ 1.1k

0

Entering edit mode

It should be p = 5224/(5646+5224) instead of 0.5 .

ADD REPLY • link 4.9 years ago by Asaf 10k

0

Entering edit mode

In this case shouldn't it be p = (5224 + 309)/(5224 + 309 + 5646 + 108) ? :)

ADD REPLY • link 4.9 years ago by e.rempel ★ 1.1k

0

Entering edit mode

Oh yeah, right. Another reason to not use Venn diagrams :)

ADD REPLY • link 4.9 years ago by Asaf 10k

0

Entering edit mode

You can overcome R 2.2e-16 limit with binom.test(...)$p.value. Per your numbers p-value is 1.6e-23.

ADD REPLY • link 4.9 years ago by jomo018 ▴ 730

score 1 · Accepted Answer · 2020-06-15

I would use proportion test. As the name says, the null hypothesis is that the proportion in each set is the same. It helps you answer questions like do we have more male proportion in group A compared to female proportion in group B (test for two proportions); or if male proportion in the group is similar/more/less in the entire population (test for one proportion)