83

I’m looking for a good arbitrary precision math library in C or C++. Could you please give me some advices or suggestions?

The primary requirements:

  1. It must handle arbitrarily big integers—my primary interest is on integers. In case that you don’t know what the word arbitrarily big means, imagine something like 100000! (the factorial of 100000).

  2. The precision must not need to be specified during library initialization or object creation. The precision should only be constrained by the available resources of the system.

  3. It should utilize the full power of the platform, and should handle “small” numbers natively. That means on a 64-bit platform, calculating (2^33 + 2^32) should use the available 64-bit CPU instructions. The library should not calculate this in the same way as it does with (2^66 + 2^65) on the same platform.

  4. It must efficiently handle addition (+), subtraction (-), multiplication (*), integer division (/), remainder (%), power (**), increment (++), decrement (--), GCD, factorial, and other common integer arithmetic calculations. The ability to handle functions like square root and logarithm that do not produce integer results is a plus. The ability to handle symbolic computations is even better.

Here are what I found so far:

  1. Java's BigInteger and BigDecimal class: I have been using these so far. I have read the source code, but I don’t understand the math underneath. It may be based on theories and algorithms that I have never learnt.

  2. The built-in integer type or in core libraries of bc, Python, Ruby, Haskell, Lisp, Erlang, OCaml, PHP, some other languages: I have used some of these, but I have no idea which library they are using, or which kind of implementation they are using.

What I have already known:

  1. Using char for decimal digits and char* for decimal strings, and do calculations on the digits using a for-loop.

  2. Using int (or long int, or long long) as a basic “unit” and an array of that type as an arbitrary long integer, and do calculations on the elements using a for-loop.

  3. Using an integer type to store a decimal digit (or a few digits) as BCD (Binary-coded decimal).

  4. Booth’s multiplication algorithm.

What I don’t know:

  1. Printing the binary array mentioned above in decimal without using naive methods. An example of a naive method: (1) add the bits from the lowest to the highest: 1, 2, 4, 8, 16, 32, … (2) use a char*-string mentioned above to store the intermediate decimal results).

What I appreciate:

  1. Good comparisons on GMP, MPFR, decNumber (or other libraries that are good in your opinion).

  2. Good suggestions on books and articles that I should read. For example, an illustration with figures on how a non-naive binary-to-decimal conversion algorithm works would be good. The article “Binary to Decimal Conversion in Limited Precision” by Douglas W. Jones is an example of a good article.

  3. Any help in general.

Please do not answer this question if you think that using double (or long double, or long long double) can solve this problem easily. If you do think so, you don’t understand the issue in question.

ib.
  • 25,324
  • 10
  • 71
  • 97
  • 3
    As far as I can see, GMP seems to be a good library. What I wonder is why there's a need for the contributors of Python / Haskell / Erlang / etc to re-invent the wheel. If GMP is so good, why don't rely on it? The GPL / LGPL license may be one of the issues, but despite of this (and also the rounding mode issue), are there any other disadvantages of GMP? Is the built-in integer of Python / Haskell / Erlang / any cryptography library faster than GMP? If so, I would like to extract and use it, if license permits. – Siu Ching Pong -Asuka Kenji- Apr 02 '10 at 18:57
  • I found a nice article at http://www.cs.uiowa.edu/~jones/bcd/decimal.html by Douglas W. Jones. The article describes a method to convert a 16-bit integer to decimal representation using only 8-bit integer arithmetic. The idea is to break the 16-bit number into 4 nibbles, each representing a base-16 "digit". So, digit-0 (n0) represents x1, n1 => x16, n2 => x256, n3 => x4096. Then, it is obvious that digit-0 of the decimal number (d0) is digit-0 of the result of n0 * 1 + n1 * 6 + n2 * 6 + n3 * 6. By handling the carry properly, d1 to d4 can also be computed. – Siu Ching Pong -Asuka Kenji- Apr 03 '10 at 03:53
  • However, as far as I could imagine, Douglas's idea above could not be extended to handle arbitrarily long binary integers. It is because the numbers 1 (16^0), 16 (16^1), 256 (16^2) and 4096 (16^3) are pre-calculated. The problem then becomes how to represent 16^n in decimal for arbitrarily large n. – Siu Ching Pong -Asuka Kenji- Apr 03 '10 at 04:04

5 Answers5

26

GMP is the popular choice. Squeak Smalltalk has a very nice library, but it's written in Smalltalk.

You asked for relevant books or articles. The tricky part of bignums is long division. I recommend Per Brinch Hansen's paper Multiple-Length Division Revisited: A Tour of the Minefield.

Norman Ramsey
  • 188,173
  • 57
  • 343
  • 523
  • Thank you for your link to the paper! Yes, I agree that division is the most tricky part. I knew how to do division by hand using "paper-and-pencil methods" long time ago :-), and thus the same method can be applied to a decimal string of `char *` (each `char` representing a decimal digit) or an `int *` of BCD string (each `int` representing 4 / 8 / 16 BCD digits). However, I wonder if real-world production level libraries mimics the "paper-and-pencil method", as it is too slow. – Siu Ching Pong -Asuka Kenji- Apr 04 '10 at 06:23
  • To see why, let's imagine how it runs for `100,000,000,000,000,000 / 333,333,333,333`: The first step is to compare `100,000,000,000` with `333,333,333,333`. Because the former is less than the latter, the calculation simply moves to the next digit. The second step is to find the quotient of `1,000,000,000,000 / 333,333,333,333`, this involves either a trial-and-error multiplication of `333,333,333,333 * 1` (and `* 2`, `* 3` and `* 4`), or doing successive subtractions in a loop. Do you see how slow it is? I believe that more efficient algorithms exist. – Siu Ching Pong -Asuka Kenji- Apr 04 '10 at 06:36
  • 3
    @Sui: Brinch Hanson shows how you can reduce the trial-and-error method to at most two trials. It's really very impressive. – Norman Ramsey Apr 04 '10 at 19:51
  • Okay, let me have a more detailed look to the paper. Thank you! – Siu Ching Pong -Asuka Kenji- Apr 05 '10 at 02:08
  • 1
    I'm not sure where you ended up finding your solution, nor what format you used to store digits, but COBOL's COMP-3 nybble format is a lot nicer to deal with, as it's more compact, each 4 bits storing a 0-9 value, AND, you aren't required to subtract hex 30 from the ASCII char value to get a usable digit. –  Jan 02 '13 at 03:40
14

Overall, he fastest general purpose arbitrary precision library is GMP. If you want to work with floating point values, look at the the MPFR library. MPFR is based on GMP.

Regarding native arbitrary precision support in other languages, Python uses its own implementation because of license, code size, and code portability reasons. The GMPY module lets Python access the GMP library.

ib.
  • 25,324
  • 10
  • 71
  • 97
casevh
  • 10,348
  • 1
  • 19
  • 30
  • Thank you for your response! You mentioned "code portability". Could you please explain what the problem is? I thought that GMP is portable and is supported on major platforms... – Siu Ching Pong -Asuka Kenji- Apr 03 '10 at 12:46
  • 2
    "code portability" is not the same as "supported on major platforms". Python uses a simple implementation that makes very few assumptions about the behavior of the C compiler so the same code can compile on almost any C compiler. GMP uses more code (C and highly-tuned assembly) that make GMP faster but also make more assumptions about the behavior of the C compiler and assembler. For example, GMP is not well supported by the Microsoft Visual Studio compilers. There is a GMP fork called MPIR (www.mpir.org) that does support Microsoft's compilers. – casevh Apr 03 '10 at 16:53
  • I see. That means the Python implementation is more like ANSI C while the GMP implementation uses __asm tricks... Thank you for your explanations. – Siu Ching Pong -Asuka Kenji- Apr 04 '10 at 04:01
9

See TTMath, a small templated header-only library free for personal and commercial use.

ib.
  • 25,324
  • 10
  • 71
  • 97
8

I've not compared arbitrary precision arithmetic libraries to each other myself, but people who do seem to have more or less uniformly settled on GMP. For what it's worth, the arbitrary precision integers in GHC Haskell and GNU Guile Scheme are both implemented using GMP, and the fastest implementation of the pidigits benchmark on the language shootout is based on GMP.

Richard Barrell
  • 4,121
  • 2
  • 19
  • 16
5

What about Pari? It’s built on top GMP and provides all the other goodies about number theory operations you’ll ever need (and many symbolic computation stuff).

ib.
  • 25,324
  • 10
  • 71
  • 97
fortran
  • 67,715
  • 23
  • 125
  • 170