How To Quantify "Overlapping Degree" Between Two Samples

1

Entering edit mode

11.9 years ago

qiyunzhu ▴ 430

I am trying to screen for genes that possess a certain trait within a genome. I know that a previous study did the same thing using a different method. Now I want to compare my result to his result. For example, the genome contains 3000 genes. My analysis shows that 200 genes are positive. The other researcher found 180 positive genes, in which 60 are also found by me.

Now, how should I assess the "overlapping degree" between mine and his results, in a quantification manner? I guess this is a quite common issue but I just don't know what do people usually do as an academic standard. Should I simply report "60", or do "60 / 200 / 180 = 0.00167", or something else?

Thanks very much for any advices!

statistics genomics • 3.9k views

ADD COMMENT • link 11.9 years ago by qiyunzhu ▴ 430

0

Entering edit mode

Does rephrasing the question like this get to the answer you want: If person you picked 180 genes at random, and the other person picked 180 genes at random, what's the probability that we picked (at least/at most) 60 of the same genes?

ADD REPLY • link 11.9 years ago by Steve Lianoglou 5.2k

0

Entering edit mode

Hello Steve, thanks for the inspiration! That sounds the right flavor. I think I worked out a solution: probability = (a, k) (N-a, b-k) / (N, b), in which N is the total number of genes, a and b are positives obtained by two methods, k is the overlapping positives.

ADD REPLY • link 11.9 years ago by qiyunzhu ▴ 430

0

Entering edit mode

If you have statistical questions you can also post them at the stack exchange site for statistics (http://stats.stackexchange.com/)