Statistics: Combinations In Python
4
3
Entering edit mode
14.5 years ago

I need to compute combinatorials (nCr) in Python but cannot find the function to do that in 'math', 'numyp' or 'stat' libraries. Something like a function of the type:

comb = calculate_combinations(n, r)

I need the number of possible combinations, not the actual combinations, so itertools.combinations does not interest me.

Finally, I want to avoid using factorials, as the numbers I'll be calculating the combinations for can get to big and the factorials are going to be monstruous.

Any clue what function from what library I should use?

Many thanks

python statistics • 12k views
ADD COMMENT
3
Entering edit mode
14.5 years ago

There is a comb function in Scipy which should do the right thing.

ADD COMMENT
0
Entering edit mode

Exactly what I was looking for. Thanks!

ADD REPLY
1
Entering edit mode
14.5 years ago

You really don't need nothing more than math. The easiest way to calculate factorials is by using the gamma function as you can see at wikipedia. If you n,k values are small (say < 10) the best way is to hardcode them on a table (or use python bindings for GSL which already hardcoded them). Otherwise, gamma function (or its ln) is your best (and probably only) option.

I'm not sure how to do the same in python. This is how I calculate LARGE combinatorials (combinations, compositions, etc.) over protein sequences in C.

floor(exp(lgamma(N+1)-lgamma(K+1)-lgamma(N-K+1.0)));

This will give a exact value for your combination and I'm sure python has some floor() function. Otherwise, you always can use pygsl to access GSL special functions. Note that this formula do not make use of recursion and calculation of gamma function values are essentially an O(1) operation. So, this is both precise and fast.

ADD COMMENT
0
Entering edit mode

Hi @Jarretinha, my values are sometimes rather high, maybe above 1000 and I need an exact value. I would really like to compute the binomial probability directly with the combination and avoid doing the factorials myself for such large numbers. My guess is that any decent function computing the combinations will have a simplification to bypass calculating the 2 or 3 factorials needed in it. Cheers

ADD REPLY
0
Entering edit mode

In this case, gammaln function is your only option. SciPy has both gamma and gammaln on special.

ADD REPLY
1
Entering edit mode
14.5 years ago

Here's a function in Ruby that uses two tricks. First of all, it uses cancelling before calculating to remove many of the factorials that appear in both the numerator and denominator. Second, it addresses the huge number problem by doing the calculations in log2 space. I wrote this several years ago, and it seemed to solve my problem then. Should be pretty easy to translate to python.

#log base 2 definition
def log2(n)
  return Math.log(n) / Math.log(2)
end

#calculates factorial in log space
def log2factorial(n)
  sum = 0
  1.upto(n){|i|
    sum += log2(i)
  }
  return sum
end

# uses cancelling to do huge factorials
# without actually calculating them
# (a choose b)
def binomCoefficientLog2(a,b)
  top = 0
  0.upto(b-1){|i|
    top += log2(a-i)
  }                 
  bottom = log2factorial(b)
  return top - bottom
end

To get the final answer, raise two to the power of the result. Example:

> binomCoefficientLog2(10,2)
=> 5.49185309632968
> 2**binomCoefficientLog2(10,2)
=> 45.0

To verify that it works, using R:

> choose(10,2)
[1] 45
ADD COMMENT
1
Entering edit mode
14.5 years ago
Paulo Nuin ★ 3.7k

Here is some perspectives about factorials in Python. If you follow the posts you will find that you can easily calculate combinatorial in Python with this

def binom(n, m):
    b = [0] * (n + 1)
    b[0] = 1
    for i in xrange(1, n + 1):
        b[i] = 1
        j = i - 1
        while j > 0:
            b[j] += b[j - 1]
            j -= 1
    return b[m]

and factorials.

ADD COMMENT

Login before adding your answer.

Traffic: 1519 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6