4

'Simple' question, what is the fastest way to calculate the binomial coefficient? - Some threaded algorithm?

I'm looking for hints :) - not implementations :)

Skeen
  • 4,316
  • 5
  • 35
  • 67
  • Inaccurately and only for a limited range of inputs ? I think you should expand a bit on your requirements. For example, if you are only interested in machine-size integers that's one thing, arbitrary sized integers quite another. – High Performance Mark Nov 23 '10 at 13:06
  • Well okay, lets limit it to java doubles? :) - and accurate on integer level – Skeen Nov 23 '10 at 13:13
  • You might find Stirling's approximation useful. –  Nov 24 '10 at 17:23
  • [Here](http://groups.google.com/group/mpir-devel/browse_thread/thread/ac8d3fe3a6347393) is something that might interest you. [Numerical Recipes](http://www.nr.com/) might also be useful. – duffymo Nov 23 '10 at 13:09

5 Answers5

4

Well the fastest way, I reckon, would be to read them from a table rather than compute them.

Your requirements on integer accuracy from using a double representation means that C(60,30) is all but too big, being around 1e17, so that (assuming you want to have C(m,n) for all m up to some limit, and all n<=m), your table would only have around 1800 entries. As for filling the table in I think Pascal's triangle is the way to go.

dmuir
  • 449
  • 2
  • 3
3

Hint: You want to do as little multiplications as possible. The formula is n! / (k! * (n-k)!). You should do less than 2m multiplications, where m is the minimum of k and n-k. If you want to work with (fairly) big numbers, you should use a special class for the number representation (Java has BigInteger for instance).

Jordi
  • 5,578
  • 9
  • 37
  • 39
3

According to the equation below (from wikipedia) the fastest way would be to split the range i=1,k to the number of threads, give each thread one range segment, and each thread updates the final result in a lock. "Academic way" would be to split the range into tasks, each task being to calculate (n - k + i)/i, and then no matter how many threads you have, they all run in a loop asking for next task. First is faster, second is... academic.

alt text

EDIT: further explanation - in both ways we have some arbitrary number of threads. Usually the number of threads is equal to the number of processor cores, because there is no benefit in adding more threads. The difference between two ways is what those threads are doing.

In first way each thread is given N, K, I1 and I2, where I1 and I2 are the segment in the range 1..K. Each thread then has all the data it neads, so it calculates its part of the result, and upon finish updates the final result.

In second way each thread is given N, K, and access to some syncronized counter that counts from 1 to K. Each thread then aquires one value from this shared counter, calculates one fraction of the result, updates the final result, and loops on this until counter informs the thread that there are no more items. If it happens that some processor cores are faster that others then this second way will put all cores to maximum use. Downside to second way is too much synchronization that effectively blocks, say, 20% of threads all the time.

Dialecticus
  • 15,040
  • 5
  • 36
  • 88
  • Which part? Maybe you should postpone marking the correct answer until you verify that the answer is in fact correct. – Dialecticus Nov 23 '10 at 13:45
1

Here's a way that never overflows if the final result is expressible natively in the machine, involves no multiplications/factorizations, is easily parallelized, and generalizes to BigInteger-types:

First note that the binomial coefficients satisfy following:

binomnk .

This yields a straightforward recursion for computing the coefficient: the base cases are binomrr and binomr0, both of which are 1.

The individual results from the subcalls are integers and if \binom{n}{k} can be represented by an int, they can too; so, overflow is not a concern.

Naively implemented, the recursion leads to repeated subcalls and exponential runtimes.

This can be fixed by caching intermediate results. There are n^2 subproblems, which can be combined in O(1) time, yielding an O(n^2) complexity bound.

Community
  • 1
  • 1
user486972
  • 376
  • 1
  • 3
0

This answer calculates binomial with Python:

def h(a, b, c):
    x = 0
    part = str("=") 
    while x < (c+1):
        nCr = math.comb(c,x)
        part = part+'+'+str(int(a**(c-1))*int(b**x)*int(nCr)+'x^'+str(x)
        x = x+1
    print(part) 
    
    h(2,6,4)
Tomerikoo
  • 12,112
  • 9
  • 27
  • 37