Multinomial Distributions for large n
1
0
Entering edit mode
21 months ago
rtrende ▴ 80

Hello all. I have some biological data that I am analyzing, and it requires me to work with multinomial distributions with large numbers (n, k > 20). By analogy, the sorts of problems I'm looking to answer are:

Imagine you have an urn with marbles of 50 different colors, each with differing relative frequencies. You are only interested in 20 pre-specified colors of marbles. Imagine you draw 30 marbles from the urn. What is the likelihood that both (1) all the marbles you draw are one of the pre-specified colors and (2) you draw at least one marble of each color.

I am computing this now by individually finding all possible ways you could draw 30 marbles and see only and all of the 20 pre-specified colors (>200,000,000 ways), calculating the probability of each way individually with a multinomial distribution, and adding all these probabilities up. This works but is extremely slow, and I can't use numbers any larger than these without using all the memory on my computer.

Is there any way to compute this more efficiently? In particular I've wondered if you could calculate P(all marbles you draw are one of the pre-specified colors) with a binomial distribution and then somehow calculate P(you draw at least one marble of each color | all the marbles you draw are one of the pre-specified colors)... but I'm not sure how to calculate the latter probability

This is, admittedly, a \confusing thought experiment... let me know if anything about it could be clarified in a way that would help you understand the problem!

math multinomial distribution bayesian statistics • 823 views
ADD COMMENT
1
Entering edit mode
21 months ago
Jaehyun ▴ 10

Hello,

Is the urn assumed to be infinite? If so, let us say the probability of each pre-specified color is p_1, ..., and p_20, respectively.

(1) It is simply the sum of the probabilities sampling only marbles with a specific color.

P = (p_1)^30 + (p_2)^30 + ... + (p_20)^30

(2) This is complicated: let us think from simple cases.

1 Case 1: 1 specified color

The probability of having at least one marvel with the specified color is one minus the probability of not drawing the marvel at all. Not drawing marvels in interest is equivalent to drawing the marvel out of interest in all of the 30 tries, whose probability is

(1 - p_1)^30

So, picking at least one marvel in the specified color is

P = 1 - (1 - p_1)^30

[2] Case 2: 2 specified colors

The probability is one minus the probability of not drawing the marvel of at least one pre-specified color. As in case 1, the probability of drawing marvels without color 1 is (1 - p_1)^30, and the probability of not drawing marvels with color 2 is (1 - p_2)^30. The probability of not drawing marvels with either of the pre-specified colors is (1 - p_1 - p_2)^30. Since P(A∪B) = P(A) + P(B) - P(A∩B),

P = 1 - ((1 - p_1)^30 + (1 - p_2)^30 - (1 - p_1 - p_2)^30)

[3] Case 3: 3 specified colors

Venn diagram of 3 cases

Since P(A∪B∪C) = P(A) + P(B) + P(C) - (P(A∩B) + P(B∩C) + P(C∩A)) + P(A∩B∩C),

P = 1 - ((1 - p_1)^30 + (1 - p_2)^30 + (1 - p_3)^30 - (1 - p_1 - p_2)^30 - (1 - p_2 - p_3)^30 - (1 - p_3 - p_1)^30 + (1 - p_1 - p_2 - p_3)^30)

This can be extended to the case with 4, 5, ... specified colors. This may be done by getting all possible combinations from the colors (for example, a function combn in R). The key to the calculation is altering the signs (+ or -) by the number of colors included in the combination.

I hope this can help you solve the problem. If there is something vague, please let me know.

Thank you.

ADD COMMENT
0
Entering edit mode

That is incredibly helpful and makes sense, thank you! I'll figure out how to program this and make sure that it matches the results that I expect.

Thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6