5

My factorial function seems to work for numbers between 1 and 6, but not for numbers much bigger than 6, for example starting with 21! the results are negative.

I cannot figure out why. Here's my function:

factorial :: Int -> Int
factorial 0 = 1
factorial 1 = 1
factorial num = num * factorial( num - 1)

And here's my binomial coefficient function that calls my factorial function (maybe the problem comes from this one ?):

binomialCoef :: Int -> Int -> Int
binomialCoef n 1 = n
binomialCoef n k = factorial n `div` 
                        ((factorial k) * factorial (n - k))
Micha Wiedenmann
  • 17,330
  • 20
  • 79
  • 123
Elior
  • 75
  • 4
  • 2
    This is due to the fact that an `Int` has a fixed number of bits, and thus eventually the number will not be represented anymore. – Willem Van Onsem Mar 15 '21 at 10:14
  • 1
    If it's every number after 21, I guess you probably have an overflow, ie if you exceed the max int, it rolls back to the min, a large negative number. – NatMath Mar 15 '21 at 10:16
  • Be aware that this implementation of factorial isn't as useful as others. I'd recommend looking at the gamma and ln(gamma) functions. – duffymo Mar 15 '21 at 10:20
  • @duffymo oh okay, I'll look into them, thanks :) – Elior Mar 15 '21 at 10:27

3 Answers3

6

(…) realized my factorial function returns negative numbers starting at 21!, and I can't figure out why.

Because an Int has a fixed number of bits. An Int should at least represent all numbers between -2-29 and 229-1, and on a 64-bit system, typically it will represent numbers between -2-63 and 263-1, but regardless what bounds it represents, it will eventually run out of bits to represent such number.

You can work with Integer to represent arbitrary large numbers:

factorial :: Integer -> Integer
factorial 0 = 1
factorial 1 = 1
factorial num = num * factorial (num-1)

For example:

Prelude> factorial 21
51090942171709440000
Prelude> factorial 22
1124000727777607680000
Willem Van Onsem
  • 321,217
  • 26
  • 295
  • 405
2

The binomial coefficient is where ln(gamma) really shines:

Bi(n, k) = n!/(k!*(n-k)!)

Taking the natural log of both sides:

ln(Bi(n, k)) = ln(n!) - ln(k!) - ln((n-k)!)

But

gamma(n) = (n-1)!

Or

gamma(n+1) = n!

Substituting

ln(Bi(n, k)) = lngamma(n+1) - lngamma(k+1) -lngamma(n-k+1)

Taking the exponential of both sides gives the final result:

Bi(n, k) = exp(lngamma(n+1) - lngamma(k+1) - lngamma(n-k+1))

There's a Haskell implementation. I haven't looked at it, but it should return a Double instead of an Integer. You won't have overflow problems because of that fact. It'll also be better behaved because you will be subtracting logarithms instead of dividing a large numerator by a large product in the denominator.

duffymo
  • 293,097
  • 41
  • 348
  • 541
1

Of course best way to avoid integer overflow and wrap-around while calculating a big factorial is not to calculate the factorial in the first place. Instead, since

factorial n = product [1..n]

keeping [1..n] as the representation of the factorial of n is as good -- or even much better -- as calculating the actual number. Postponing an action until absolutely unavoidable we get to pre-optimize it before post-calculating:

bincoef :: Int -> Int -> Int
bincoef n k = factorial n `div` 
                        ((factorial k) * factorial (n - k))
 = product [1 .. n] `div` 
        (product [1 .. k] * product [1 .. n-k])
 = product [n-k+1 .. n] `div` 
         product [1 .. k]
 = foldl' g 1 $ zip [n, n-1 .. n-k+1] [1 .. k]
     where g !acc (a,b) = (acc * a) `div` b

So now,

> mapM_ (\n -> print $ map (bincoef n) [5,10..n]) [20,30..60]
[15504,184756,15504,1]
[142506,30045015,155117520,30045015,142506,1]
[658008,847660528,40225345056,137846528820,40225345056,847660528,658008,1]
[2118760,10272278170,2250829575120,47129212243960,126410606437752,47129212243960,
2250829575120,10272278170,2118760,1]
[5461512,75394027566,53194089192720,4191844505805495,51915437974328292,1182645815
64861424,51915437974328292,4191844505805495,53194089192720,75394027566,5461512,1]

> head . filter (not . snd) $ map (\n -> (n, all (> 0) $ map (bincoef n) [1..n])) [1..]
(62,False)

the Int wrap-around error makes its first appearance at n=62. But it's still working at n=60, and we can see there are more than 16 digits in those numbers, so no Double-based calculation has a hope of working correctly, there.

To get into yet higher ranges still with the Int-based operations only, the next logical step is keeping the lists of integers as originally proposed, or better yet as their prime factorizations which are easy to multiply and divide; but at that point we'd be getting pretty close to re-implementing the bignum arithmetic ourselves, so might as well just use the simple Integer-based code,

bc :: Integer -> Integer -> Integer
bc n k = product [n-k+1 .. n] `div` product [1 .. k]

which "just works".

> bc 600 199
124988418115780688528958442419612410733294315465732363826979722360319899409241320138
666379143574138790334901309769571503484430553926248548697640619977793300443439200
Will Ness
  • 62,652
  • 8
  • 86
  • 167
  • 1
    This is actually what I implemented later in my code haha, thank you for explaining it so well ! – Elior Mar 16 '21 at 09:17
  • nice! there's one more bit of overflow-avoidance to be squeezed out here: instead of ``g !acc (a,b) = (acc * a) `div` b``, divide first, and multiply after that. the problem is we don't know which will be divisible, so we'd need to run _both_ through `divMod`... probably try `acc` first, as it will be progressively more likely to be divisible, I think. – Will Ness Mar 16 '21 at 10:23
  • that's before going over to the prime factorizations, of course. then we get to truly [*postpone*](https://stackoverflow.com/a/8871918/849891) the final multiplication until absolutely the last unavoidable minute. :) – Will Ness Mar 16 '21 at 10:29
  • When the multiplication overflows, you can calculate either `gcd acc b` or `gcd a b` and use that to compute ``(acc * a) `div` b``, as explained in [this answer](https://stackoverflow.com/questions/1838368/calculating-the-amount-of-combinations/4701106#4701106). You can check for possible overflow using [`GHC.Exts.mulIntMayOflo#`](https://hackage.haskell.org/package/base-4.15.0.0/docs/GHC-Exts.html#v:mulIntMayOflo-35-). – dfeuer Apr 06 '21 at 19:45
  • @dfeuer still it won't buy us any noticeably more range with the `Int`s, I think. the linked answer itself seems to be saying this. the other recourse is to use the prime factorizations directly (or bignums of course). – Will Ness Apr 07 '21 at 05:45