A example in Gotw 67

Question

There is a example in http://www.gotw.ca/gotw/067.htm

int main()
{
  double x = 1e8;
  //float x = 1e8;
  while( x > 0 )
  {
    --x;
  }
}

When you change the double to float, it's a infinite loop in VS2008. According to the Gotw explanation:

What if float can't exactly represent all integer values from 0 to 1e8? Then the modified program will start counting down, but will eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)... and then the loop will stay stuck on that value until the machine on which the program is running runs out of power.

From what I understand, the IEEE754 float is a single precision(32 bits) and the range of float should be +/- 3.4e +/- 38 and it should have a 7 digits significant.

But I still don't understand how exactly this happens: "eventually reach a value N which can't be represented and for which N-1 == N (due to insufficient floating-point precision)." Can someone try to explan this bit ?

A bit of extra info : When I use double x = 1e8, it finished in about 1 sec, when I change it to float x = 1e8, it runs much longer(still running after 5 min), also if I change it to float x = 1e7;, it finished in about 1 second.

My testing environment is VS2008.

BTW I'm NOT asking the basic IEEE 754 format explanation as I already understand that.

Thanks

[This](http://babbage.cs.qc.edu/IEEE-754/Decimal.html) floating point calculator might help your understanding. Try inputting `1<<24==16777216` and then `(1<<24)+1==16777217` and see that the 32-bit floating point representation is the same. — user786653, Aug 09 '11 at 12:30
I understand the IEEE 754 floating & double format, but this does not answer my question... — Gob00st, Aug 09 '11 at 12:32
Yes, a float can store numbers with only 7 significant digits. It should be intuitive that you'll run into trouble when you expect it to deal with a number that has 8 digits. — Hans Passant, Aug 09 '11 at 12:47
So you're clear about `1e8f` being equal to `1e8f-1.f` (at least they are on my machine) and the loop will get stuck the first time it is entered (`x` stays the same as `x == x-1`)? Is the question why he says eventually (as in this case it happens the first time around)? That I think is just to avoid saying something that will not necessarily be true for all conforming compilers/systems. — user786653, Aug 09 '11 at 12:54
@user786653, yes, I saw that, but the ranger for single precision should be +/- 3.4e +/- 38 as I mentioned, why it's struggling with only 1e8 ? — Gob00st, Aug 09 '11 at 13:07
The range for single precision may be `+/- 3.4e +/- 38` but that doesn't mean it can represent every single number in that range precisely. — Praetorian, Aug 09 '11 at 13:21
Yes, as it shouldn't.Single precision only have 7 digital significant. But the code starts with a pretty square number 1e8, not some obscure number. — Gob00st, Aug 09 '11 at 13:38
Yes so `x` starts out as `1e8f` (exactly representable in a 32-bit float), `while (x > 0)` is true, so the loop runs. Now the result of `x--` is ALSO `1e8f` since the correct result `99999999.0f` is NOT exactly representable in a 32-bit float and it is rounded to the same representation as `1e8f`. — user786653, Aug 09 '11 at 13:48
@Gob00st: it's important to realize that the number 1e8 is no more or no less "precise" than 99999999 or 100000001. All of those contain the same amount of "information", i.e. they need the same amount of bits to be stored exactly. — Joachim Sauer, Aug 09 '11 at 14:57
I have read all the answers/comments and I think there are quite a few correct answer already. Because computer/compiler cannot represent infinite real numbers , so even though the range of single float is big,but we can easily found numbers in between those representation, which is rounded... — Gob00st, Aug 09 '11 at 21:16

janneb · Accepted Answer · 2011-08-10T07:20:32.140

Well, for the sake of argument, lets assume we have a processor which represents a floating point number with 7 significant decimal digits, and an mantissa with, say, 2 decimal digits. So now the number 1e8 would be stored as

1.000 000 e 08

(where the "." and "e" need not be actually stored.)

So now you want to compute "1e8 - 1". 1 is represented as

1.000 000 e 00

Now, in order to do the subtraction we first do a subtraction with infinite precision, then normalize so that the first digit before the "." is between 1 and 9, and finally round to the nearest representable value (with break on even, say). The infinite precision result of "1e8 - 1" is

0.99 999 999 e 08

or normalized

9.9 999 999 e 07

As can be seen, the infinite precision result needs one more digit in the significand than what our architecture actually provides; hence we need to round (and re-normalize) the infinitely precise result to 7 significant digits, resulting in

1.000 000 e 08

Hence you end up with "1e8 - 1 == 1e8" and your loop never terminates.

Now, in reality you're using IEEE 754 binary floats, which are a bit different, but the principle is roughly the same.

score 3 · Answer 2 · answered Aug 09 '11 at 14:37

The operation x-- is (in this case) equivalent to x = x - 1. That means the original value of x is taken, 1 is subtracted (using infinite precision, as mandated by IEEE 754-1985), and then the result is rounded to the next value of the float value space.

The rounded result for the numbers 1.0e8f + i is given for i in [-10;10] below:

 -10: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -9: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -8: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -7: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -6: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -5: 9.9999992E7     (binary +|10011001|01111101011110000011111)
  -4: 1.0E8           (binary +|10011001|01111101011110000100000)
  -3: 1.0E8           (binary +|10011001|01111101011110000100000)
  -2: 1.0E8           (binary +|10011001|01111101011110000100000)
  -1: 1.0E8           (binary +|10011001|01111101011110000100000)
   0: 1.0E8           (binary +|10011001|01111101011110000100000)
   1: 1.0E8           (binary +|10011001|01111101011110000100000)
   2: 1.0E8           (binary +|10011001|01111101011110000100000)
   3: 1.0E8           (binary +|10011001|01111101011110000100000)
   4: 1.0E8           (binary +|10011001|01111101011110000100000)
   5: 1.00000008E8    (binary +|10011001|01111101011110000100001)
   6: 1.00000008E8    (binary +|10011001|01111101011110000100001)
   7: 1.00000008E8    (binary +|10011001|01111101011110000100001)
   8: 1.00000008E8    (binary +|10011001|01111101011110000100001)
   9: 1.00000008E8    (binary +|10011001|01111101011110000100001)
  10: 1.00000008E8    (binary +|10011001|01111101011110000100001)

So you can see that 1.0e8f and 1.0e8f + 4 and some other numbers have the same representation. Since you already know the details of the IEEE 754-1985 floating point formats, you also know that the remaining digits must have been rounded away.

score 1 · Answer 3 · answered Aug 09 '11 at 12:16

1

What is the result of n - 1 if n - 1 and n have both identical representation due to the approximate nature of floating point numbers?

answered Aug 09 '11 at 12:16

visitor

1,753
10
7

That does not answer his question "how exactly this happen"[sic]. I.e. *how* can it be that a n-1 == n. – Bart Aug 09 '11 at 12:22
Your answer is a general answer for floating precision,not for this particular question... – Gob00st Aug 09 '11 at 12:30
@Bart: it does answer the question you point out, but not the question of "eventually reach" that the OP posed. it is indeed difficult to envision a representation where "eventually reach" is meaningful. i think i would call such a representation "perverse" :-) – Cheers and hth. - Alf Aug 09 '11 at 12:31
@Alf That was my point, although perhaps poorly worded. Thanks – Bart Aug 09 '11 at 12:35
@Alf "perverse" representation is an understatement. Even if the range of integers cannot be exactly represented, why does it follows that there is an N such that N - 1 == N? Starting from a given float F (1e8 in this case) you can surely have that subtracting n times the number 1 from F will not give an integer, but I don't immediately see why there should be such an attractor point. – Francesco Aug 09 '11 at 12:44
@Francesco: the attractor comes from what visitor explained by his question, that both values have the same representation. then, given N, and computing N-1, you get the same bitpattern as for N. which essentially means, you have N again -- and stuck. – Cheers and hth. - Alf Aug 09 '11 at 12:48
yes I see that since the whole range cannot be represented there must be some "overlapping" bitpatterns. But given N couldn't the "equivalent" bitpattern reached by adding a different value, rather than 1? Why the distance is exactly 1? – Francesco Aug 09 '11 at 13:05
@Francesco: what is the result of substracting `1` from `+inf` ? Here we use `1` because Sutter used `--x` in its example, which substracts `1`. – Matthieu M. Aug 09 '11 at 13:29
2

@Francesco: the formal mathematical reasoning is based on the pigeonhole principle. FLOAT_MAX > 10^38 so there are over 10^38 positive integers < FLOAT_MAX, but a IEE754 float can represent at most 2^31 possible positive values. I therefore_cannot_ map every positive integer to a float value. The 2^31 float states and the 10^38 integers are both ordered. That means one of those integers that's not mapped to a float state has a larger neighbor which is mapped. IEEE754 math dictate rounding. If this mapped integer MI is decremented, (MI-1) isn't mapped and rounded back up to MI. – MSalters Aug 09 '11 at 14:58
@Msalters thanks, I saw the pigeonhole principle but I was missing something and that was that both being ordered the distance must be 1. Clear now. – Francesco Aug 09 '11 at 15:04

score 1 · Answer 4 · answered Aug 09 '11 at 12:22

1

Regarding "reach" a value that can't be represented, I think Herb was including the possibility of quite esoteric floating point representations.

With any ordinary floating point representations, you will either start with such value (i.e. stuck on first value), or you will be somewhere in the contiguous range of integers centered around zero that can be represented exactly, so that the countdown succeeds.

For IEEE 754 the 32-bit representation, typically float in C++, has 23 bits mantissa, while the 64-bit representation, typically double in C++, has 52 bits mantissa. This means that with double you can at least represent exactly the integers in the range -(2^52-1) ... 2^52-1. I'm not quite sure if the range can be extended with another factor of 2. I get a bit dizzy thinking about it. :-)

Cheers & hth.,

answered Aug 09 '11 at 12:22

Cheers and hth. - Alf

135,616
15
192
304

But the starting value is a quite regular one 1e8, not a strange one... How could reducing from 1e8 --x would run out of precision ??? – Gob00st Aug 09 '11 at 12:29
@Gob00st: I don't know of any floating point representation where reducing by 1, starting with an integer, could "eventually" get stuck instead of either being stuck already, or getting down to 0. I don't think such a representation exists. But if it existed it could satisfy the requirements of the C++ standard, which allow just about anything. – Cheers and hth. - Alf Aug 09 '11 at 12:36
I have tried double x = 1e8, it finished in about 1 sec. But when i change it to float x = 1e8, it runs much longer(still running BTW) – Gob00st Aug 09 '11 at 12:39
@Gob00st: assuming a 32-bit IEEE 754 `float`, the highest exactly represented integer seems to be 2^23-1, unless I'm off by a factor of 2. If it is 2^23-1, then utilizing the fact that 2^10 = 1024 ~= 1000 = 10^3, you have the value 2^23 = 2^3*K*K ~ 8*10^6. So that's where you should expect the stuckness to begin: any higher value than that, u should get stuck. :-( By the way, for ways to deal with stuckness in general, I recommend the book "Zen and the Art of Motorcycle Maintenance" by Robert M. Pirzig. Cheers, – Cheers and hth. - Alf Aug 09 '11 at 12:44
Why it's the highest represented integer of float is 2^23-1 ? Shouldn't it be around 2^125 ? – Gob00st Aug 09 '11 at 13:44
@Gob00st: Highest value where every integer smaller than that is representable. – janneb Aug 09 '11 at 14:01

A example in Gotw 67

4 Answers4