Help With Multinomial Distribution Application To Genome Question
2
0
Entering edit mode
11.3 years ago
Wayne ★ 1.0k

Hello all, I need help with a stats question I'm trying to solve for a biological question.

Say you have a town and each street on it will have a different number of bins. In each bin you will find a different number of baseballs. Whats interesting is if one particular bin on one street has many baseballs relative to the other bins on that street. How can you statistically compare different streets to say that someone is choosing one street over another to non randomly add baseballs. I have tried using multinomial.... say n= (total number of baseballs on a street) , X= number of baseballs in any particular bin on a street, L= number of bins on a street, Pi = probability of randomly getting a baseball in a bin or 1/L. The formula would then be

[(N)! /(Xsub1! ..... Xsubk!)] * [(1/L)^(Xsub1) .... (1/L)^(Xsubk)]

http://en.wikipedia.org/wiki/Multinomial_distribution.

Now this seems to work well, except as your L increases your p-value will also decrease which doesn't make sense. For example a street with 10 bins and 20 mutations in just one of those 10 bins should be more significant than a street with 1000 bis and 10 baseballs all in different bins out of the 1000. How do I normalize for L?

Any help would be greatly appreciated I know this explanation sucks but its the least abstract one I could think of!

statistics genome mutation cancer • 1.8k views
ADD COMMENT
1
Entering edit mode
11.3 years ago

If you're talking about a random process that is adding baseballs to a set number of bins (or reads to a genome, etc), then you're describing a Poisson process. This can be modelled by a Poisson distribution and you can test for significance against that distribution. In R, the function you're looking for is ppois()

ADD COMMENT
0
Entering edit mode
11.3 years ago
Wayne ★ 1.0k

I'm not sure I follow. I need to be able to compare any gene vs any other gene. Each amino acid position is a "bin", I'm looking to compare between a gene of length 10 with 5 mutations all at amino acid position 1 vs a gene with length 1000 and 5 mutations at 5 independent AA sites (in other words no recurrent mutations). Does this make sense?

ADD COMMENT
0
Entering edit mode

Please do not post new questions as an answer. Use the comments under the original answer.

ADD REPLY

Login before adding your answer.

Traffic: 1672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6