4

I'm getting some strange results doing a calculation for the application I'm working on and I thought someone on here might be able to help figure out what's going on.

The requirements for this particular calculation state that the calculation should look like this:

A and B are known

A * B = C

For this particular calculation

A = 0.0410
B = 123456789010

Here are the results I'm seeing:

Calculator:

0.0410 * 123456789010 = 5061728349.41

Java:

B is a double:

0.0410f * 123456789010d = 5.061728489223363E9 = 5061728489.223363

B is a long:

0.0410f * 123456789010l = 5.0617288E9 

The loss of precision is of less importance to me (I only need 9 digits of precision anyway) than the difference in the 10s and 1s spot. Why does doing the calculation using the double give me the "wrong" result?

Incidentally, I tried doing the calculation using BigDecimal and got the same result as I did using a double.

Joel
  • 255
  • 1
  • 2
  • 12

5 Answers5

6

The various type conversions that happen are specified by the JLS #5.6.2. In your case (extract):

  • If either operand is of type double, the other is converted to double.
  • Otherwise, if either operand is of type float, the other is converted to float.

In 0.0410f * 123456789010d = 506172848.9223363, 0.0410f is first converted to a double which is not necessarily equal to 0.0410d. Actually you can try it and see that is is not:

    double d1 = 0.041d;
    double d2 = 0.041f;
    System.out.println(new BigDecimal(d1));
    System.out.println(new BigDecimal(d2));

outputs:

0.041000000000000001720845688168992637656629085540771484375
0.041000001132488250732421875

In your next example:

0.0410f * 123456789010L = 506172832

the long is converted to a float, which you can verify with this example:

    float f1 = 0.0410f;
    float f2 = 123456789010L;
    System.out.println(new BigDecimal(f1)); // 0.041000001132488250732421875
    System.out.println(new BigDecimal(f2)); // 123456790528
    System.out.println(new BigDecimal(0.0410f * 123456789010L)); // 5061728768
    System.out.println(new BigDecimal(f1 * f2)); // 5061728768

As for the precision of float / double operations in general, check this question.

Finally, if you use a BigDecimal, you get the correct answer:

    BigDecimal a = new BigDecimal("0.041");
    BigDecimal b = new BigDecimal("123456789010");
    System.out.println(a.multiply(b)); // outputs 5061728349.410
Community
  • 1
  • 1
assylias
  • 297,541
  • 71
  • 621
  • 741
  • I don't think the float is being converted to a double; I think the conversion is happening after the multiplication. An explicit cast to double before the multiply corrects it. – Phil H Jun 06 '12 at 12:26
  • @PhilH I have added a reference. When you multiply a float by a double, the float first gets converted to a double. – assylias Jun 06 '12 at 12:40
  • Thank you for the excellent description of the problem! Your code examples are really helpful and illustrate the problem. Phil might be right about the conversion to a double happening after the multiplication is completed, but it's a minor detail and a big big help. I have already started redoing my calculations using all BigDecimals. – Joel Jun 06 '12 at 12:41
  • @Joel The float does get converted to a double, but that does not change its value - I have reworded that part which was unclear in my initial answer. Basically, `0.041f != 0.041d`. – assylias Jun 06 '12 at 12:53
2

TLDR Answer: The float cannot represent the 'correct' answer any more exactly. Use a double instead. Also the multiplication will be done inexactly as well without an explicit cast.

Answers I get using http://www.ideone.com

  A      B     C      
float  long  float  5061728768.000000
double long  double 5061728489.223363

The problem is that the precision of a float is much less than a double, so when multiplied up by a large number (e.g. your 10^10 value) you lose this precision in the multiplication. If we explicitly cast A to a double for the multiplication:

double C = ((double)A)*B; //=5061728489.223363

Then we get back the additional precision. If we cast the double answer back to a float:

float C = (float)((double)((double)A)*B); //=5061728256.000000 

You see that the answer is different again. The result type of the multiply is used, so in this instance double, but the cast back to float drops precision. Without an explicit case to double (double C=A*B), the float type is used. With both casts, the multiply is done as a double, and the precision is lost after the multiplication.

Phil H
  • 18,593
  • 6
  • 62
  • 99
  • Wow great explanation. There is a lot to consider when doing this type of calculation – Joel Jun 06 '12 at 16:58
1

The first calculation is using double (64 bits), the second float (32 bits). What you are seeing is "rounding errors".

In both cases it is a floating-point calculation, but in the second case, no "double" arguments are involved, so it just uses 32 bit arithmetic.

Quoting the Java language spec:

If at least one of the operands to a binary operator is of floating-point type, then the operation is a floating-point operation, even if the other is integral.

If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. If the other operand is not a double, it is first widened (§5.1.5) to type double by numeric promotion (§5.6).

Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. (If the other operand is not a float, it is first widened to type float by numeric promotion.)

Community
  • 1
  • 1
Thilo
  • 241,635
  • 91
  • 474
  • 626
1

The answer to your question is probably in the Floating point operation section of the Java Language Specification and in this older post. You are probably experiencing rounding errors due to the implicit conversion that is ocurring.

The quote that applies to your situation is

Third operation:

If at least one of the operands to a binary operator is of floating-point type, then the operation is a floating-point operation, even if the other is integral.

Second operation:

If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. If the other operand is not a double, it is first widened (§5.1.5) to type double by numeric promotion (§5.6).

First operation

Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. (If the other operand is not a float, it is first widened to type float by numeric promotion.)

Hence, you should not be worried, but decide what is the precision you desire and use the appropriate casting, if necessary.

Community
  • 1
  • 1
rlinden
  • 2,005
  • 1
  • 11
  • 13
1

32-bit IEEE floating point numbers have seven digits of precision; 64-bit allows 16. That's all you get. If neither of those is sufficient, you have to use BigDecimal.

This is true in every language that implements the IEE standard, not just Java.

duffymo
  • 293,097
  • 41
  • 348
  • 541