Question

Hypergeometric Test On Gene Set

11

Entering edit mode

11.6 years ago

ChIP ▴ 600

Hi!

This is not the first time this question is being asked, but I am confused from the previous post.

I have say two lists. List1 has 598 genes and List2 has 5500 genes and the total genes available in the pool from which these two are drawn is of size 23000 (say).

Now, if I have to make compute whether the overlap between the two list which is of 89 genes is significant or not.

I have two formulas:

method 1

phyper=(overlap-1,list1,PopSize-list1,list2,lower.tail = FALSE, log.p = FALSE)
phyper=(88,598,23000-598,5500,lower.tail = FALSE, log.p = FALSE)

method 2

phyper=(overlap,list1,PopSize,list2,lower.tail = FALSE, log.p = FALSE)
phyper=(89,598,23000,5500,lower.tail = FALSE, log.p = FALSE)

Now which method shall I use and why?

I am really confused.

Thank you

r statistics • 27k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 11.6 years ago by ChIP ▴ 600

4

Entering edit mode

This thread should help you: http://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c

and remember that the p-value is the probability of obtaining a result at least as extreme as the one that was randomly observed

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 11.6 years ago by arno.guille ▴ 420

score 2 · Answer 1 · 2014-01-14

2

Entering edit mode

11.6 years ago

Sudeep ★ 1.7k

Your method1 looks like the correct one. AFAIK, in phyper=(q,m,n,k)

n should be PopSize-list1

you can check this stackoverflow thread as well

ADD COMMENT • link 11.6 years ago by Sudeep ★ 1.7k

0

Entering edit mode

Can anyone explain about the q-1, why or why not?

ADD REPLY • link 11.6 years ago by Madelaine Gogol 5.3k

2

Entering edit mode

answer is here http://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c

phyper(x, m, n, k) gives the probability of getting x or fewer, so phyper(x, m, n, k) is the same as sum(dhyper(0:x, m, n, k)).

The lower.tail=FALSE is a bit confusing. phyper(x, m, n, k, lower.tail=FALSE) is the same as 1-phyper(x, m, n, k), and so is the probability of x+1 or more

ADD REPLY • link 11.6 years ago by arno.guille ▴ 420

score 0 · Answer 2 · 2016-04-30

0

Entering edit mode

9.3 years ago

Alejandro Jimenez Sanchez ▴ 180

I think method 1 is the correct one, because.

Method 1 gives the same result as this site: https://www.geneprof.org/GeneProf/tools/hypergeometric.jsp

Method 2 gives a different result.

ADD COMMENT • link 9.3 years ago by Alejandro Jimenez Sanchez ▴ 180