38

What is the most efficient method to evaluate the value of n choose k ? The brute force way I think would be to find n factorial / k factorial / (n-k) factorial .

A better strategy may be to use dp according to this recursive formula. Is there any other better method to evaluate n choose k ?

Nikunj Banka
  • 10,091
  • 15
  • 68
  • 105
  • Factorial computation is much more efficient than your recursive alternative in both space and time terms – SomeWittyUsername Mar 08 '13 at 19:55
  • 2
    Well, for starters you can replace `n!/k!` with `n*(n-1)*(n-2)*...*(k+1)` No point in calculating `n!` and `k!` in full when many of the factors cancel out. – Tim Goodman Mar 08 '13 at 19:56
  • What range of n are you considering? – Andrew Morton Mar 08 '13 at 20:32
  • @AndrewMorton I have to calculate n choose k where n is <1000000 and k is < 1000000. – Nikunj Banka Mar 09 '13 at 03:03
  • possible duplicate of [Algorithm to return all combinations of k elements from n](http://stackoverflow.com/questions/127704/algorithm-to-return-all-combinations-of-k-elements-from-n) – Toto Mar 10 '13 at 09:38
  • 6
    @M42: this question is not a duplicate of the one you link to. That question asks for all combinations of k elements from n, whereas this question only wants the *number* of such combinations. – Luke Woodward Mar 10 '13 at 12:32

8 Answers8

47

Here is my version, which works purely in integers (the division by k always produces an integer quotient) and is fast at O(k):

function choose(n, k)
    if k == 0 return 1
    return (n * choose(n - 1, k - 1)) / k

I wrote it recursively because it's so simple and pretty, but you could transform it to an iterative solution if you like.

Undo
  • 25,204
  • 37
  • 102
  • 124
user448810
  • 16,364
  • 2
  • 31
  • 53
  • 1
    It's not O(*k*); *k* is strictly less than *n*, so you can't ignore the contribution of *n* to the run-time. At best, you can say it is O(*k* M(*n*)), where M(*n*) is the speed of your multiplication algorithm. – chepner Mar 08 '13 at 20:13
  • 5
    Correct, but pedantic. The function stated above makes O(k) multiplications and divisions. I ignored the bit-complexity of the operations themselves. – user448810 Mar 08 '13 at 20:19
  • This function calculates `n!/k!`. That's not what the question is about – SomeWittyUsername Mar 08 '13 at 20:28
  • 5
    @icepack: No, it doesn't. The numerator ranges from n to n-k+1. The denominator ranges from k to 1. Thus, choose(9,4) = (9*8*7*6) / (4*3*2*1) = 126, which is correct. By contrast, 9!/4! = 362880 / 24 = 15120. – user448810 Mar 08 '13 at 20:40
  • 1
    This is the multiplicative method in a recursion form. It is indeed O(k) and it is the fastest it can get unless an estimation of k! using Stirling's approximation is good enough (http://en.wikipedia.org/wiki/Stirling%27s_approximation). There is a divide and conquer version of factorial that *might* help too (http://gmplib.org/manual/Factorial-Algorithm.html) – Pedrom Mar 08 '13 at 21:57
  • I think this gets overflown very easily in C++, I get all kinds of strange results and/or errors... I try to calculate the sum of n choose n-k of some range: http://pastebin.com/jhNAp8KD – shinzou Jan 04 '15 at 11:25
35

You could use the Multiplicative formula for this:

enter image description here

http://en.wikipedia.org/wiki/Binomial_coefficient#Multiplicative_formula

Pedrom
  • 3,713
  • 20
  • 26
  • 18
    we can speed it up by calculating (n choose n-k) instead of (n choose k) when n-k < k. – Nikunj Banka Mar 09 '13 at 14:06
  • 3
    Just take into account that `(n - (k-i)) / i`may not be an integer – pomber Jun 12 '16 at 01:21
  • 1
    @pomber the individual `factor (n - (k-i)) / i` may in fact not be an integer BUT the product of the factors from `x=1 upto y` will always be divisible by `y` (because there are exactly y consecutive integers) – Lambder Dec 23 '16 at 12:11
  • 1
    Slightly updated formula: https://wikimedia.org/api/rest_v1/media/math/render/svg/652661edd20c8121e58a2b26844ce46c24180f0f – trig-ger Jan 16 '17 at 11:05
7

Probably the easiest way to compute binomial coefficients (n choose k) without overflowing is to use Pascal's triangle. No fractions or multiplications are necessary. (n choose k). The nth row and kth entry of Pascal's triangle gives the value.

Take a look at this page. This is an O(n^2) operation with only addition, which you can solve with dynamic programming. It's going to be lightning fast for any number that can fit in a 64-bit integer.

Andrew Mao
  • 31,800
  • 17
  • 126
  • 212
5

If you're going to calculate many combinations like this, calculating the Pascal's Triangle is sure the best option. As you already know the recursive formula, I think I can past some code here:

MAX_N = 100
MAX_K = 100

C = [[1] + [0]*MAX_K for i in range(MAX_N+1)]

for i in range(1, MAX_N+1):
    for j in range(1, MAX_K+1):
        C[i][j] = C[i-1][j-1] + C[i-1][j];

print C[10][2]
print C[10][8]
print C[10][3]
fabiomaia
  • 573
  • 6
  • 19
Juan Lopes
  • 9,563
  • 2
  • 23
  • 41
1

The problem with the n!/k!(n-k)! approach is not so much the cost as the issue with ! growing very rapidly so that, even for values of nCk which are well within the scope of, say, 64-bit integers, intermediate calculations are not. If you don't like kainaw's recursive addition approach you could try the multiplicative approach:

nCk == product(i=1..k) (n-(k-i))/i

where product(i=1..k) means the product of all the terms when i takes the values 1,2,...,k.

High Performance Mark
  • 74,067
  • 7
  • 97
  • 147
  • You're right about the possibility of overflow corrupting things well before the final answer stops fitting in a machine word, but I don't like your solution: it will produce fractional values for some factors, e.g. for the i=2 factor when n=4 and k=3. (Of course the factors will multiply together in the end to give an integer, but your way means intermediate results need to be stored in floating point -- yuck!) – j_random_hacker Aug 11 '13 at 00:24
1

The fastest way is probably to use the formula, and not pascals triangle. Let's start not to do multiplications when we know that we're going to divide by the same number later. If k < n/2, let's have k = n - k. We know that C(n,k) = C(n,n-k) Now :

n! / (k! x (n-k)!) = (product of numbers between (k+1) and n) / (n-k)!

At least with this technique, you're never dividing by a number that you used to multiply before. You have (n-k) multiplications, and (n-k) divisions.

I'm thinking about a way to avoid all divisions, by finding GCDs between the numbers that we have to multiply, and those we have to divide. I'll try to edit later.

bruce_ricard
  • 695
  • 6
  • 14
  • Finding the GCD will surely reduce the amount of operations. Unfortunately, the GCD finding for itself would be a much heavier task. – SomeWittyUsername Mar 08 '13 at 20:24
  • Yes I'm afraid of that. But the GCDs would be computed on small numbers, when the multiplication has a big one. And actually I'm not sure that a GCD is harder than a division. – bruce_ricard Mar 08 '13 at 20:30
  • I tend to be skeptical, but it would be interesting to see the results :) – SomeWittyUsername Mar 08 '13 at 20:34
  • Division and GCD both have a O(n^2) complexity for 2 numbers of size n. Here we would calculate division on a big and a small number, whereas the GCD would be on 2 small numbers, but we would need to do it for all the numbers. If I had to do it by hand, I think I'd try to find at least the obvious multiples and GCDs, to avoid doing useless divisions. – bruce_ricard Mar 08 '13 at 20:36
  • If you want the prime factorization of C(n,k), you can use Krummer's Theorem. (You need to know all the primes less than or equal to n.) http://planetmath.org/encyclopedia/KummersTheorem.html This doesn't quite avoid divisions, since you need to be able to express k and n-k base p for each prime p. – rici Mar 08 '13 at 20:36
  • @rici : I don't care about the prime decomposition of C(n,k), I even don't care about any prime factor, I just want to avoid divisions by computing GCDs. – bruce_ricard Mar 08 '13 at 20:38
  • @double_squeeze: if you have the prime decomposition of C(n,k), you can compute C(n,k) with only multiplications. Isn't that your goal? – rici Mar 08 '13 at 20:39
0

If you have a lookup table of factorials then the calculation of C(n,k) will be very fast.

Andrew Morton
  • 21,016
  • 8
  • 48
  • 69
  • For big number of n and k that lookup table might be prohibitive. Also there should be an option for values outside that table. – Pedrom Mar 08 '13 at 19:58
  • @Pedrom There was no mention of limitations on the magnitude of numbers in the question. It's tagged `language-agnostic` and `algorithms`. – Andrew Morton Mar 08 '13 at 20:05
-4

"Most efficient" is a poor request. What are you trying to make efficient? The stack? Memory? Speed? Overall, my opinion is that the recursive method is most efficient because it only uses addition (a cheap operation) and the recursion won't be too bad for most cases. The function is:

nchoosek(n, k)
{
    if(k==0) return 1;
    if(n==0) return 0;
    return nchoosek(n-1, k-1)+nchoosek(n-1,k);
}
kainaw
  • 4,033
  • 1
  • 14
  • 30
  • 1
    That's a tree recursion and for big values for n and k it might not finish at all. – Pedrom Mar 08 '13 at 19:53
  • 6
    It's `O(2^n)` time and `O(n)` space. Factorial computation is `O(n)` time and `O(1)` space. – SomeWittyUsername Mar 08 '13 at 19:54
  • I disagree. This is effectively computing Pascal's triangle; it's most definitely **NOT** `O(2^n)` - it's `O(n^2)`. This is a square, not a tree. Plus, it will never overflow if the result is storable in a long, and addition is much faster than multiplication and division. Even more, you can memoize with `f(n, k) = f(n, n-k)`, and all the edge cases are 1. – Andrew Mao Mar 08 '13 at 20:05
  • "the recursion won't be too bad for most cases"? that's a poor answer! in "most cases" it will perform really, really bad because of the tree recursion, for n in the range of 20's or 30's the function won't even stop in any reasonable amount of time – Óscar López Mar 08 '13 at 20:05
  • @AndrewMao That recursion is not a tail recursion and has two recursion calls, therefore the inner recursive process implies a tree recursion. This is not too much different from a fibonacci implementation. It could indeed result in a stack overflow if n and k are big enough. – Pedrom Mar 08 '13 at 20:09
  • Again, I strongly disagree. You can get an idea of why this is not a tree just by looking at Pascal's triangle. I agree that the OP did not address this but for practical purposes (i.e. not BigInteger) it will be faster than multiplication/division algorithms. – Andrew Mao Mar 08 '13 at 20:10
  • 2
    This IS most definitely exponential. The complexity of computing nchoosek(n,k) is nchoosek(n,k) at least, since your base cases are 0 and 1. If you do the same with dynamic programming, you'll get a n^2 complexity, here you're calculating the same results many times. – bruce_ricard Mar 08 '13 at 20:12
  • 2
    @AndrewMao Each call to this function results in 2 nodes in the recursion tree. The recursion stops after n steps (I assume k <= n but that doesn't matter in general case) ==> 2^n nodes in the tree, O(2^n) running time. Memoization is orthogonal to this method, the benefits of memoization aren't related to recursion. – SomeWittyUsername Mar 08 '13 at 20:12
  • I did not explain this in the answer, because I felt that it would make it more confusing... but this will use many of the same values on both sides of the addition. So, by using memoization, you effectively remove one side of the addition from the recursion. You take a hit in memory for a huge savings in recursion. – kainaw Mar 08 '13 at 20:29
  • @kainaw this has nothing to do with the recursion. Straightforward calculation of Pascal triangle (which implies memoization) will most likely perform better – SomeWittyUsername Mar 08 '13 at 20:32
  • 1
    This is a poor method even with memoization. C(10^12, 2) would require a trillion additions, but the multiplicative formula is instantaneous. – Dave Radcliffe Oct 25 '18 at 04:58