why does long long 2147483647 + 1 = -2147483648?

Question

Why doesn't this code print same number? :

long long a, b;
a = 2147483647 + 1;
b = 2147483648;
printf("%lld\n", a);
printf("%lld\n", b);

I know that int variable's maximum number is 2147483647 because int variable is 4 byte. But as I know, long long variable is 8 byte, but why does that code act like that?

range for int is actually [−32,767, +32,767], range for long int is [−2,147,483,647, +2,147,483,647]. — Kerry Cao, May 05 '20 at 23:44
@KerryCao sizeof(int) is hardware dependent, but on modern hardware it's usually 4 bytes. https://stackoverflow.com/questions/11438794/is-the-size-of-c-int-2-bytes-or-4-bytes — Jeremy Friesner, May 05 '20 at 23:46
@KerryCao • until the recent 2s complement change to the standard. (It's a moving target. And very complicated. Even with 2s complement, overflow of signed is undefined behavior.) — Eljay, May 06 '20 at 00:45
Note that signed overflow was and still is Undefined Behavior in C++. Your compiler should warn you about this problem though. https://godbolt.org/z/krZBUa — Max Langhof, May 06 '20 at 12:36
@KerryCao -- those are the **minimum** requirements; compilers are allowed to provide wider ranges, subject to the requirement that `long` is at least as wide as `int`, `int` is at least as wide as `short`, and `short` is at least as wide as `char`. — Pete Becker, May 06 '20 at 13:07
To get the actual range of a type, `#include ` and look at `std::numeric_limits::min()` and `std::numeric_limits::max()`. Replace `int` with the integer type that you're actually interested in. (You can also use those two functions with floating-point types, but `std::numeric_limits::min()` has a somewhat unintuitive definition.) — Pete Becker, May 06 '20 at 13:12
Does this answer your question? [why overflow when compare long long with int](https://stackoverflow.com/questions/34702468/why-overflow-when-compare-long-long-with-int) — Julien Lopez, May 06 '20 at 15:09
@Leos313, gcc gives `int.c:5:16: warning: integer overflow in expression [-Woverflow]`. But of course that doesn't tell _why_ it overflows, even though all the numbers fit in a `long long`. — ilkkachu, May 06 '20 at 17:40
How does such a question get so many upvotes? Do people these days really not understand two's complement numbers? — TomServo, May 06 '20 at 22:20
The question should not be dual tagged, https://meta.stackoverflow.com/questions/374306/proposed-update-to-c-and-c-tag-usage-wikis — M.M, May 07 '20 at 03:38
@ilkkachu `g++` under Linux it signs the position too :-) `a.cpp:7:16: warning: integer overflow in expression [-Woverflow]` __`a = 2147483647 + 1;`__ __`~~~~~~~~~~~^~~`__ — Hastur, May 07 '20 at 09:26
@TomServo The question is not about two's complement, but automatic datatype promotion (or lack thereof as in this case). — Ian Kemp, May 07 '20 at 14:37
`a` is `long long int` but `2147483647` and `1` are not. Before being assigned to `a`, the expression `2147483647 + 1` is computed using the types of the values it uses and the type of both numbers is `int`. — axiac, May 26 '20 at 19:32

Paul Sanders · Accepted Answer · 2020-06-02T23:57:08.387

127

2147483647 + 1 is evaluated as the sum of two ints and therefore overflows.

2147483648 is too big to fit in an int and is therefore assumed by the compiler to be a long (or a long long in MSVC). It therefore does not overflow.

To perform the summation as a long long use the appropriate constant suffix, i.e.

a = 2147483647LL + 1;

edited Jun 02 '20 at 23:57

answered May 05 '20 at 23:40

Paul Sanders

15,937
4
18
36

Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/213422/discussion-on-answer-by-paul-sanders-why-does-long-long-2147483647-1-214748). – Samuel Liew May 08 '20 at 13:39

Peter Cordes · Answer 2 · 2020-08-19T19:38:22.527

This signed integer overflow is undefined behaviour, like always in C/C++

What Every C Programmer Should Know About Undefined Behavior

Unless you compile with gcc -fwrapv or equivalent to make signed integer overflow well-defined as 2's complement wrap-around. With gcc -fwrapv or any other implementation that defines integer overflow = wraparound, the wrapping that you happened to see in practice is well-defined and follows from other ISO C rules for types of integer literals and evaluating expressions.

T var = expression only implicitly converts the expression to type T after evaluating the expression according to standard rules. Like (T)(expression), not like (int64_t)2147483647 + (int64_t)1.

A compiler could have chosen to assume that this path of execution is never reached and emitted an illegal instruction or something. Implementing 2's complement wraparound on overflow in constant expressions is just a choice that some/most compilers make.

The ISO C standard specifies that a numeric literal has type int unless the value is too large to fit (it can be long or long long, or unsigned for hex), or if a size override is used. Then the usual integer promotion rules apply for binary operators like + and *, regardless of whether it's part of a compile-time constant expression or not.

This is a simple and consistent rule that's easy for compilers to implement, even in the early days of C when compilers had to run on limited machines.

Thus in ISO C/C++ 2147483647 + 1 is undefined behaviour on implementations with 32-bit int. Treating it as int (and thus wrapping the value to signed negative) follows naturally from the ISO C rules for what type the expression should have, and from normal evaluation rules for the non-overflow case. Current compilers don't choose to define the behaviour differently from that.

ISO C/C++ do leave it undefined, so an implementation could pick literally anything (including nasal demons) without violating the C/C++ standards. In practice this behaviour (wrap + warn) is one of the less objectionable ones, and follows from treating signed integer overflow as wrapping, which is what often happens in practice at runtime.

Also, some compilers have options to actually define that behaviour officially for all cases, not just compile-time constant expressions. (gcc -fwrapv).

Compilers do warn about this

Good compilers will warn about many forms of UB when they're visible at compile time, including this. GCC and clang warn even without -Wall. From the Godbolt compiler explorer:

  clang
<source>:5:20: warning: overflow in expression; result is -2147483648 with type 'int' [-Winteger-overflow]
    a = 2147483647 + 1;
                   ^

  gcc
<source>: In function 'void foo()':
<source>:5:20: warning: integer overflow in expression of type 'int' results in '-2147483648' [-Woverflow]
    5 |     a = 2147483647 + 1;
      |         ~~~~~~~~~~~^~~

GCC has had this warning enabled by default since at least GCC4.1 in 2006 (oldest version on Godbolt), and clang since 3.3.

MSVC only warns with -Wall, which for MSVC is unusably verbose most of the time, e.g. stdio.h results in tons of warnings like 'vfwprintf': unreferenced inline function has been removed. MSVC's warning for this looks like:

  MSVC -Wall
<source>(5): warning C4307: '+': signed integral constant overflow

@HumanJHawkins asked why it was designed this way:

To me, this question is asking, why doesn't the compiler also use the smallest data type that the result of a math operation will fit into? With integer literals, it would be possible to know at compile time that an overflow error was occurring. But the compiler does not bother to know this and handle it. Why is that?

"Doesn't bother to handle it" is a bit strong; compilers do detect the overflow and warn about it. But they follow ISO C rules that say int + int has type int, and that the numeric literals each have type int. Compilers merely choose on purpose to wrap instead of to widening and giving the expression a different type than you'd expect. (Instead of bailing out entirely because of the UB.)

Wrapping is common when signed overflow happens at run-time, although in loops compilers do aggressively optimize int i / array[i] to avoid redoing sign-extension every iteration.

Widening would bring its own (smaller) set of pitfalls like printf("%d %d\n", 2147483647 + 1, 2147483647); having undefined behaviour (and failing in practice on 32-bit machines) because of a type mismatch with the format string. If 2147483647 + 1 implicitly promoted to long long, you'd need a %lld format string. (And it would break in practice because a 64-bit int is typically passed in two arg-passing slots on a 32-bit machine, so the 2nd %d would probably see the 2nd half of the first long long.)

To be fair, that's already a problem for -2147483648. As an expression in C/C++ source it has type long or long long. It's parsed as 2147483648 separately from the unary - operator, and 2147483648 doesn't fit in a 32-bit signed int. Thus it has the next largest type that can represent the value.

However, any program affected by that widening would have had UB (and probably wrapping) without it, and it's more likely that widening will make code happen to work. There's a design philosophy issue here: too many layers of "happens to work" and forgiving behaviour make it hard to understand exactly why something does work, and hard to verity that it will be portable to other implementations with other type widths. Unlike "safe" languages like Java, C is very unsafe and has different implementation-defined things on different platforms, but many developers only have one implementation to test on. (Especially before the internet and online continuous-integration testing.)

ISO C doesn't define the behaviour, so yes a compiler could define new behaviour as an extension without breaking compatibility with any UB-free programs. But unless every compiler supported it, you couldn't use it in portable C programs. I could imagine it as a GNU extension supported by gcc/clang/ICC at least.

Also, such an options would somewhat conflict with -fwrapv which does define the behaviour. Overall I think it's unlikely to catch be adopted because there's convenient syntax to specifying the type of a literal (0x7fffffffUL + 1 gives you an unsigned long which is guaranteed to be wide enough for that value as a 32-bit unsigned integer.)

But let's consider this as a choice for C in the first place, instead of the current design.

One possible design would be to infer the type of a whole integer constant expression from its value, calculated with arbitrary precision. Why arbitrary precision instead of long long or unsigned long long? Those might not be large enough for intermediate parts of the expression if the final value is small because of /, >>, -, or & operators.

Or a simpler design like the C preprocessor where constant integer expressions are evaluated at some fixed implementation-defined width like at least 64-bit. (But then assign a type based on the final value, or based on the widest temporary value in an expression?) But that has the obvious downside for early C on 16-bit machines that it makes compile-time expressions slower to evaluation than if the compiler can use the machine's native integer width internally for int expressions.

Integer constant-expressions are already somewhat special in C, required to be evaluated at compile time in some contexts, e.g. for static int array[1024 * 1024 * 1024]; (where the multiplies will overflow on implementations with 16-bit int.)

Obviously we can't efficiently extend the promotion rule to non-constant expressions; if (a*b)/c might have to evaluate a*b as long long instead of int on a 32-bit machine, the division will require extended precision. (For example x86's 64-bit / 32-bit => 32-bit division instruction faults on overflow of the quotient instead of silently truncating the result, so even assigning the result to an int wouldn't let the compiler optimize well for some cases.)

Also, do we really want the behaviour / definedness of a * b to depend on whether a and b are static const or not? Having compile time evaluation rules match the rules for non-constant expressions seems good in general, even though it leaves these nasty pitfalls. But again, this is something good compilers can warn about in constant expressions.

Other more common cases of this C gotcha are things like 1<<40 instead of 1ULL << 40 to define a bit flag, or writing 1T as 1024*1024*1024*1024.

Hmm, I wonder, since the signed overflow is undefined, could the compiler decide to treat the result as a `long` instead? Or maybe that's what you meant with inferring the type from the values, but did mention "following ISO C rules that say the expression has type int", so I'm exactly sure if that would be allowed. — ilkkachu, May 07 '20 at 15:35
@ilkkachu: My answer does kind of contradict itself >.<. about="" actually="" after="" along="" an="" and="" anything="" apply="" as="" be="" before="" but="" c="" case.="" compile-time="" compilers="" concerned.="" contains="" do="" doesn="" during="" edits="" else="" encountered.="" entire="" evaluate="" evaluation.="" execution="" far="" fine="" for="" happen="" has="" in="" int64_t="" integer="" is="" iso="" it="" literally="" nothing="" of="" or="" overflow="" path="" phrasing="" require="" rules="" say="" should="" so="" source="" standard="" suggestions="" that="" the="" then="" to="" type="" ub="" value="" welcome.="" what="" would="" wrap="" yes=""> — Peter Cordes, May 07 '20 at 15:53
@ilkkachu: Updated with better phrasing. Saying "ISO C *requires* it to be UB" implied all kinds of wrong ideas about what UB is. e.g. that it's required to crash or required to warn, or whatever. In fact it just means that you're in uncharted territory where the ISO C standard has nothing to say about the program. Anything you choose to do is either just happens-to-work or implementation-defined behaviour. (Implementations are 100% allowed to define any behaviour they want in cases where the standard doesn't define behaviour, e.g. like `gcc -fwrapv` or `-fno-strict-aliasing`.) — Peter Cordes, May 07 '20 at 16:08
yep, thanks. That's pretty much what I thought it should when undefined. It was the part under @HumanJHawkins' question that got me. (that's where you had the "rules say it's int" written a bit strictly) — ilkkachu, May 07 '20 at 16:15
@ilkkachu: Ah right, I didn't look as much at that part of what I wrote earlier. The ISO C rules do say that `int * int` is an `int`. Current compilers are always following that rule. So any proposal for this to work differently would override that rule for this case, which is one of the downsides to any proposed extension for current compilers. Or even from a historical language-design perspective for how C could have been designed differently in the first place. Not a good thing if `printf("%d %d\n", 2147483647 + 1, 2147483647);` fails because of a type mismatch with fmt string. — Peter Cordes, May 07 '20 at 16:20
@ilkkachu: I had another look at the 2nd section of my answer and changed / expanded it some, including that phrasing you asked about. I wasn't happy with my phrasing the first time I wrote it; I'd like to express the same thing in fewer words but IDK if that's possible. — Peter Cordes, May 07 '20 at 16:41
oh, good point on the varargs types, that would be a rather annoying obstacle. — ilkkachu, May 07 '20 at 18:56
@ilkkachu: Also for C++ overload resolution I guess, like `foo( 1024 * 1024 * 1024 )` being `foo(long)` on systems with 16-bit int, instead of "just" being compile-time UB and a call to `foo(int)`. — Peter Cordes, May 07 '20 at 19:04
It is worth noting that while C++20 defines two-complement singed integer, integer overflow is still UB. — dmeister, May 08 '20 at 01:33
Just an example of why relying on the signed integer overflow behaviour aka undefined behaviour is bad, see https://godbolt.org/z/scZLjI. — dmeister, May 08 '20 at 01:40
@dmeister: I'm not sure what that example was supposed to prove. AFAICT, it will do exactly 1 iteration `0 <= 0`, adding `s += 0 % 77` = 0. Then `1 <= 0` is false and the loop exits. No UB. If that's not what you meant to write, comment your code to describe the UB next time, please. — Peter Cordes, May 08 '20 at 02:06
@PeterCordes sorry. you are right. My mistake. better example.: https://godbolt.org/z/haTK3P — dmeister, May 08 '20 at 03:46
@dmeister: There are many platforms and circumstances where it may be cheaper to perform some computations in arithmetically-correct fashion than with wraparound semantics (e.g. `(x*12345)/12345`) so I don't regard as astonishing optimizations which might cause computations to behave as though performed with an unspecified larger type. Far more astonishing IMHO is the fact that in gcc, `unsigned mul_mod_65536(unsigned short x, unsigned short y) { return (x*y) & 0xFFFFu;}` will only work reliably if `x` doesn't exceed `2147483647/y.`. — supercat, May 11 '20 at 20:33

score 6 · Answer 3 · answered May 07 '20 at 09:30

Nice question. As others said, numbers by default are int, so your operation for a acts on two ints and overflows. I tried to reproduce this, and extend a bit to cast the number into long long variable and then add the 1 to it, as the c example below:

$ cat test.c 
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

void main() {
  long long a, b, c;

  a = 2147483647 + 1;
  b = 2147483648;

  c = 2147483647;
  c = c + 1;

  printf("%lld\n", a);
  printf("%lld\n", b);
  printf("%lld\n", c);
}

The compiler does warn about overflow BTW, and normally you should compile production code with -Werror -Wall to avoid mishaps like this:

$ gcc -m64 test.c -o test
test.c: In function 'main':
test.c:8:16: warning: integer overflow in expression [-Woverflow]
 a = 2147483647 + 1;
                ^

Finally, the test results are as expected (int overflow in first case, long long int's in second and third):

$ ./test 
-2147483648
2147483648
2147483648

Another gcc version warns even further:

test.c: In function ‘main’:
test.c:8:16: warning: integer overflow in expression [-Woverflow]
 a = 2147483647 + 1;
                ^
test.c:9:1: warning: this decimal constant is unsigned only in ISO C90
 b = 2147483648;
 ^

Note also that technically int and long and variations of that are architecture-dependent, so their bit length can vary. For predictably sized types you can be better off with int64_t, uint32_t and so on that are commonly defined in modern compilers and system headers, so whatever bitness your application is built for, the data types remain predictable. Note also the printing and scanning of such values is compounded by macros like PRIu64 etc.

I tried to build the `-m32` and `-m16` versions. This failed on my Debian VM as it lacks proper headers to fulfill the compilation. On an OpenIndiana (OpenSolaris-next-of-kin) the 32-bit version built and worked same as 64-bit one, and a 16-bit one segfaulted right in `main()` so the results are inconclusive :) I hoped to illustrate it printing different integer values because `int` or `long long` definition could end up differently in those bitnesses. — Jim Klimov, May 07 '20 at 09:38
x86 `gcc -m16` makes code that runs in 16-bit mode but still uses 32-bit operand-size (and the same ABI as 32-bit, like `int` is still `int32_t`). I don't think you can run the resulting binaries under GNU/Linux, even if you had libraries. `gcc -m64` and `gcc -m32` also have identical sizes for `int` and `long long`, only disagreeing about the width of `long`. Sometimes ISA details matter for UB, but not in this case. The behaviour is effectively implementation-defined during evaluation of the integer constant expression, nothing weird left to happen at run time. — Peter Cordes, May 07 '20 at 16:51
And BTW, you can get `gcc -m32` to work on Debian by [installing `gcc-multilib`](https://askubuntu.com/q/453681) — Peter Cordes, May 07 '20 at 16:52

score -3 · Answer 4 · edited May 07 '20 at 07:35

-3

Because range of int in C/C++ is -2147483648 to +2147483647.

So when you add 1, it overflows the max limit of int.

For better understanding, assume the whole range of int puts on a circle in proper order:

2147483647 + 1 == -2147483648

2147483647 + 2 == -2147483647

If you want to overcome this, try to use long long instead of int.

edited May 07 '20 at 07:35

Ardent Coder

3,309
9
18
39

answered May 07 '20 at 07:28

Manish Kumar

1
1

This is only guaranteed to work with `gcc -fwrapv` to define the behaviour of signed overflow. – Peter Cordes May 07 '20 at 14:36
This is an accurate description of 32-bit 2's complement signed wraparound, though, which is in practice what current C compilers do. But it's not accurate to say the *range of int in C/C++ is ...* without any qualifiers. C implementations for microcontrollers and DSPs with 16-bit `int` still widespread, and C only requires that the range be *at least* `-32767 .. 32767`, and that it's 2's complement, 1's complement, or sign/magnitude. https://en.cppreference.com/w/cpp/language/types – Peter Cordes May 07 '20 at 16:47

why does long long 2147483647 + 1 = -2147483648?

4 Answers4

This signed integer overflow is undefined behaviour, like always in C/C++

Compilers do warn about this

Linked

Related