Question

Testing Gene Set Overlap with Binomial Distribution or Hypergeometric Distribution

0

Entering edit mode

15 months ago

Apex92 ▴ 300

Dear all,

I have two gene sets and I want to see if the amount of overlap between these two sets is significant using binomial statistics.

I came up with this approach in r but it does not give significant p-value however based on the hypergeometric test (assuming in total I have 10,000 as the background set) I get a significant p-value.

# Parameters
n_A <- 90  # Number of genes in Set A
n_B <- 2588  # Number of genes in Set B
k <- 37  # Number of overlapping genes

#probability of overlap
p <- n_A / n_B

p_value <- 1 - pbinom(k - 1, n_B, p)

print(paste("Calculated p-value:", p_value))

How to resolve this?

Another question is, is it important that n_B should always be bigger in the binomial distribution test?

Thank you in advance.

statistics Enrichment • 640 views

ADD COMMENT • link 15 months ago by Apex92 ▴ 300

score 0 · Answer 1 · 2023-08-17

0

Entering edit mode

15 months ago

Michael 55k

Have a look at the hypergeometric distribution as discussed here: Probability of gene list overlap

ADD COMMENT • link 15 months ago by Michael 55k

0

Entering edit mode

Thank you for your comment. So based on the thread you shared, I assume I can calculate the p-value as:

n_A=90
n_B=2588
n_C=10000
n_A_B=37

p-val_1 <- 1-phyper(n_A_B, n_B, n_C-n_B, n_A) #p>n_A_B
p-val_2 <- phyper(n_A_B - 1, n_A, n_C-n_A, n_B, lower.tail = FALSE) #p>=n_A_B

Is that correct? And it does not matter if the n_A is bigger or smaller that n_B right?

ADD REPLY • link 15 months ago by Apex92 ▴ 300