27

Consider the following programs:

// http://ideone.com/4I0dT
#include <limits>
#include <iostream>

int main()
{
    int max = std::numeric_limits<int>::max();
    unsigned int one = 1;
    unsigned int result = max + one;
    std::cout << result;
}

and

// http://ideone.com/UBuFZ
#include <limits>
#include <iostream>

int main()
{
    unsigned int us = 42;
    int neg = -43;
    int result = us + neg;
    std::cout << result;
}

How does the + operator "know" which is the correct type to return? The general rule is to convert all of the arguments to the widest type, but here there's no clear "winner" between int and unsigned int. In the first case, unsigned int must be being chosen as the result of operator+, because I get a result of 2147483648. In the second case, it must be choosing int, because I get a result of -1. Yet I don't see in the general case how this is decidable. Is this undefined behavior I'm seeing or something else?

Wolf
  • 8,482
  • 7
  • 48
  • 92
Billy ONeal
  • 97,781
  • 45
  • 291
  • 525
  • 9
    FWIW, `std::cout << typeid(x + y).name()` can quickly tell you the type of an expression, at least if you know what names your implementation gives to the various integer types. No need to try to figure it out from a value. – Steve Jessop Jul 21 '11 at 01:32
  • 5
    You can also get the compiler to spit it out for you in an error like this: http://ideone.com/m3cBv – GManNickG Jul 21 '11 at 01:47
  • 1
    @SteveJessop @GManNickG or you can get the type from the compiler via by defining this function `template void func(T t) { static_assert(std::is_empty::value, "testing"); }` and putting the expression into the function. – Trevor Boyd Smith Apr 11 '18 at 12:10

3 Answers3

40

This is outlined explicitly in §5/9:

Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:

  • If either operand is of type long double, the other shall be converted to long double.
  • Otherwise, if either operand is double, the other shall be converted to double.
  • Otherwise, if either operand is float, the other shall be converted to float.
  • Otherwise, the integral promotions shall be performed on both operands.
  • Then, if either operand is unsigned long the other shall be converted to unsigned long.
  • Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int shall be converted to a long int; otherwise both operands shall be converted to unsigned long int.
  • Otherwise, if either operand is long, the other shall be converted to long.
  • Otherwise, if either operand is unsigned, the other shall be converted to unsigned.

[Note: otherwise, the only remaining case is that both operands are int]

In both of your scenarios, the result of operator+ is unsigned. Consequently, the second scenario is effectively:

int result = static_cast<int>(us + static_cast<unsigned>(neg));

Because in this case the value of us + neg is not representable by int, the value of result is implementation-defined – §4.7/3:

If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.

ildjarn
  • 59,718
  • 8
  • 115
  • 201
  • 1
    +1 -- but I don't see how this covers my case above. It seems to me that it would fall to the last item, converting both to unsigned, which would make the second program produce a positive result. However, it does not; it produces a negative one. :/ – Billy ONeal Jul 21 '11 at 01:08
  • 2
    @Billy : `std::cout << static_cast(us + static_cast(neg));` would also print `-1`. Why would you expect it not to? (Or possibly you asked before my first edit, in which case, disregard this :-]) – ildjarn Jul 21 '11 at 01:14
  • 3
    @Billy: your analysis of the second is flawed. `us` and `neg` are both converted to unsigned, yielding a total of `UINT_MAX`. Your implementation then chooses to convert this value to `int` as `-1`. If you want to see the value of the expression `us+neg`, then do `std::cout << (us+neg)`, don't coerce the value to `int` before printing it. – Steve Jessop Jul 21 '11 at 01:16
  • 1
    @ildjarn: Ah, I see. Probably getting the check then. I suspect the behavior is actually undefined in that case though, correct? (Since really it's a signed integer overflow) – Billy ONeal Jul 21 '11 at 01:17
  • 1
    @Billy : No signed arithmetic is occurring here -- how could there be signed integer overflow? Or do you mean to ask whether `static_cast(some_unsigned_int)` is UB? – ildjarn Jul 21 '11 at 01:18
  • 5
    @Billy: it's implementation-defined, not UB (4.7/3). – Steve Jessop Jul 21 '11 at 01:18
  • 1
    @ildjarn: Because the (positive) unsigned value of the `+` is not representable as a (positive) value of `int`, so the cast effectively overflows. It works on a two's complement machine but I'm not positive it would work elsewhere. – Billy ONeal Jul 21 '11 at 01:21
  • 3
    You're basically right that in principle it might not work elsewhere. An implementation could define that converting any `unsigned int` value greater than `INT_MAX` to `int` results in the value 0. It just can't crash. – Steve Jessop Jul 21 '11 at 01:22
  • 1
    Are there actually any non-two's complement machines left anywhere?? I thought that one's complement/sign-magnitude etc was stuff from the dark ages :). I was caught worrying about something like this yesterday... – Darren Engwirda Jul 21 '11 at 01:27
  • 2
    @Darren: for practical purposes, no, there aren't any. But I think it's a worthwhile exercise in code hygiene, to know whether and how you're depending on implementation-defined behavior. It's not really about 2's complement - a 2's complement implementation is allowed to do saturating signed arithmetic and even saturating conversions. So `(int)(unsigned)(-1)` would be `INT_MAX`. But it's not so much about whether it's actually going to happen, as whether your code needs to be tagged, "not entirely portable" so that weirdos with their peculiar machines are warned ;-). – Steve Jessop Jul 21 '11 at 01:36
12

Before C was standardized, there were differences between compilers -- some followed "value preserving" rules, and others "sign preserving" rules. Sign preserving meant that if either operand was unsigned, the result was unsigned. This was simple, but at times gave rather surprising results (especially when a negative number was converted to an unsigned).

C standardized on the rather more complex "value preserving" rules. Under the value preserving rules, promotion can/does depend on the actual ranges of the types, so you can get different results on different compilers. For example, on most MS-DOS compilers, int is the same size as short and long is different from either. On many current systems int is the same size as long, and short is different from either. With value preserving rules, these can lead to the promoted type being different between the two.

The basic idea of value preserving rules is that it'll promote to a larger signed type if that can represent all the values of the smaller type. For example, a 16-bit unsigned short can be promoted to a 32-bit signed int, because every possible value of unsigned short can be represented as a signed int. The types will be promoted to an unsigned type if and only if that's necessary to preserve the values of the smaller type (e.g., if unsigned short and signed int are both 16 bits, then a signed int can't represent all possible values of unsigned short, so an unsigned short will be promoted to unsigned int).

When you assign the result as you have, the result will get converted to the destination type anyway, so most of this makes relatively little difference -- at least in most typical cases, where it'll just copy the bits into the result, and it's up to you to decide whether to interpret that as signed or unsigned.

When you don't assign the result such as in a comparison, things can get pretty ugly though. For example:

unsigned int a = 5;
signed int b = -5;

if (a > b)
    printf("Of course");
else
    printf("What!");

Under sign preserving rules, b would be promoted to unsigned, and in the process become equal to UINT_MAX - 4, so the "What!" leg of the if would be taken. With value preserving rules, you can manage to produce some strange results a bit like this as well, but 1) primarily on the DOS-like systems where int is the same size as short, and 2) it's generally harder to do it anyway.

Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
  • 2
    "C standardized on the rather more complex "value preserving" rules", "Under sign preserving rules, b would be promoted to unsigned, and in the process become equal to UINT_MAX - 4, so the "What!" leg of the if would be taken". But in standard C++, `b` *is* promoted to unsigned, and the "What!" leg *is* taken. I think something is the wrong way around somewhere. – Steve Jessop Jul 21 '11 at 01:45
  • 1
    @Steve is correct: http://ideone.com/gwjBA – Billy ONeal Jul 21 '11 at 02:02
  • 1
    Or maybe that's just a confusing example, since both sign-preserving rules and C's rules convert the operands to `unsigned`. – Steve Jessop Jul 21 '11 at 02:05
  • 1
    The rules for relational operators are one of my pet peeves for C. I can understand the philosophy of not requiring compilers to handle mixed-sign cases, especially in the days of compilers running on 16-bit computers, but it seems obnoxious that the standard doesn't allow a compiler to produce an arithmetically-correct result. – supercat Jul 21 '11 at 02:06
  • 1
    @supercat: How would you propose the compiler produce an "arithmetically correct result"? – Billy ONeal Jul 21 '11 at 02:17
  • 1
    @Billy ONeal: If both operands are signed or both unsigned, act as though the smaller type were extended to the larger. If one is signed and it's negative, regard it as the smaller one. Otherwise regard operands as unsigned. – supercat Jul 21 '11 at 04:43
  • 1
    @supercat: I suppose you could do that. Problem is now you've induced two branches instead of a single branch. For those few rare cases where it would make a difference if you want to pay for two comparisons, it's simple enough to turn it in to two comparisons manually. – Billy ONeal Jul 21 '11 at 05:47
  • 1
    @Billy: equally, for those few rare cases where you care about the performance of a mixed-sign comparison, it's easy to convert one or other operand manually. supercat's proposal only introduces extra overhead to mixed-sign comparisons, which are currently worth avoiding anyway precisely because their meaning is so dodgy. – Steve Jessop Jul 21 '11 at 11:18
  • 1
    @Billy: I guess my point is that I fail to see benefit in requiring that some comparisons regard negative mumbers as greater than some unsigned quantities, in the absence of typecasts. I could understand leaving it implementation-defined, but I don't see the value in explicitly requiring an arithmetically wrong rule whose effect is dependent upon precise details of operand sizes. – supercat Jul 21 '11 at 12:16
  • 1
    @supercat: The benefit is simple: what they're requiring can be fast because the requirements are based only on the *type* involved, not the values. Yours would require testing the value at run-time when signed/unsigned were mixed -- including quite a few cases where you don't expect the signed number to ever be negative (e.g., `for (int i=0; i – Jerry Coffin Jul 21 '11 at 16:54
  • 1
    @Jerry Coffin: Can you offer any example of code which would not be considered 'badly written' and 'non-portable' would be broken by an arithmetically-correct evaluation of a mixed-sign expression? If a programmer knows that one argument will always fit in the other argument's type, and wants to avoid generating extra code to handle the mixed comparison, the argument that will fit in the other type could be cast. If the programmer wants to force particular semantics in case one argument doesn't fit in the other type, the programmer should cast. – supercat Jul 21 '11 at 17:48
  • 1
    @Jerry Coffin: I'm unclear what is gained by requiring that mixed-sign comparisons be governed by rules, depending upon the particular data sizes involved, sometimes compel arithmetically-incorrect result (I wouldn't mind if the rule said that comparing a negative signed number with an unsigned number of an as-big-or-larger type would be implementation-defined, but I can think of no situation where a programmer should exploit the mandated "wrong" behavior without explicitly casting the signed quantity to unsigned.) Can you offer any? – supercat Jul 21 '11 at 17:55
  • 1
    @supercat: as I said, I think the current rules were formulated primarily as a (arguably premature/misplaced) optimization, not something where anybody even thought they produced ideal results. Given the charter (to codify existing behavior) I think they felt compelled to choose something already in use, which I believe limited them to value preserving or sign preserving -- AFAIK, nobody had tried what you're advocating. – Jerry Coffin Jul 21 '11 at 21:00
  • 1
    @Jerry Coffin: I guess my question would be, if the rule were written to say that the behavior of mixed-signedness comparisons in which the signed value is negative except in those cases where the standard would require the signed number to be regarded as negative, would that have allowed compilers to break any code that would be considered even remotely portable? Perhaps a chat topic might be good? – supercat Jul 21 '11 at 21:11
  • 1
    @JerryCoffin Thanks for the history lesson. [Here](http://stackoverflow.com/q/17312545/183120)'s a question with a snippet very similar to yours. – legends2k Mar 11 '15 at 09:50
2

It's choosing whatever type you put your result into or at least cout is honoring that type during output.

I don't remember for sure but I think C++ compilers generate the same arithmetic code for both, it's only compares and output that care about sign.

Andrew White
  • 50,300
  • 17
  • 108
  • 132
  • 1
    I don't see how that can be the case. I don't see in any way how the result of `operator+` can be affected by where you store the result. I stored things that way for the purpose of explanation, but I could just as easily chosen `double` for both and the result is the same. In any case, I would like to see a standard reference.... – Billy ONeal Jul 21 '11 at 01:05
  • 1
    @Billy ONeal, the actual binary result is the same but how you interpret the result is what differs. I hope I am not missing the whole point here. – Andrew White Jul 21 '11 at 01:11
  • Not quite. That is not defined. If it were to overflow as a signed int, the result would be undefined. If it were to overflow as an unsigned int, then one could rely on the result being truncated. Most machines use two's complement but that's not mandated. – Billy ONeal Jul 21 '11 at 01:12
  • Those are not equal in any case: http://ideone.com/v1kFy Unsigned 2147483648 is Signed -2147483648. – Billy ONeal Jul 21 '11 at 01:14
  • @Billy, that was me being stupid, it would wrap around to the smallest possible negative number which is -2147483648. – Andrew White Jul 21 '11 at 01:21
  • So really, it's not a matter of interpretation at all (at least not by cout). – Billy ONeal Jul 21 '11 at 01:22
  • No, it still is in this case at least. MAX_INT + 1 = -2147483648, 2147483647L + 1 = 2147483648. Same binary value different interpretations. – Andrew White Jul 21 '11 at 01:26
  • Except that signed overflow is implementation defined. `MAX_INT + 1` could equal zero. – Billy ONeal Jul 21 '11 at 01:28
  • 1
    @Billy: signed overflow is UB, not implementation-defined (5/5), although your particular example `INT_MAX + 1` is a constant expression. Of course an implementation is *allowed* to define the behavior, and every implementation I've ever used defined it to wrap around, so it's pretty difficult to observe overflow behaving badly. I've heard tales of situations where really fierce optimization can cause problems with overflow. Unless it documents otherwise, an optimizer can legally *assume* that `x + INT_MAX + 1` is non-negative, even if the implementation gives -ve values for some ints `x`. – Steve Jessop Jul 21 '11 at 01:56
  • Ah, I've vaguely remembered the example I heard about, which was something like `if (x+1 < x) { /* this code removed as dead when x is signed */ }`. Can't remember whether it actually happened and if so what compiler/options, but it's certainly a conforming optimization, because `x+1` cannot be less than `x` except as a result of UB. – Steve Jessop Jul 21 '11 at 02:01
  • @Steve Jessop: Out of curiosity, if a program contains a function which could not possibly generate defined behavior if executed (e.g. intvar = LONG_MAX * LONG_MAX), but the program can execute without evaluating that function, would the program be considered well-formed, and would a standards-conforming compiler have to produce code that would work correctly in all cases which do not result in that function being executed? – supercat Jul 21 '11 at 21:25
  • @supercat: I believe so, yes. But if the UB-producing code is executed, then that whole run of the program has UB, so there can be no expectation that any observable behavior "before" the UB will actually be observed. – Steve Jessop Jul 21 '11 at 22:04
  • @Steve Jessop: Fair enough. My question was whether a conforming compiler would be required to not to choke on things like constant expressions that would produce invalid results. – supercat Jul 21 '11 at 22:21
  • 1
    @supercat: Ah, constant expressions are different, sorry. I took your example to be more general, meaning multiply together any two values equal to `LONG_MAX`. The exact expression `LONG_MAX * LONG_MAX` is ill-formed (5/5: "If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression (5.19), in which case the program is ill-formed."). What I said before applies to `i = j = LONG_MAX; i * j;` – Steve Jessop Jul 22 '11 at 08:37
  • @Steven Jessop: So that's a special case. Understandable from a compiler-convenience standpoint, though there are some cases where it could prove a slight nuisance (e.g. "if (buff_size > BUFF_LIMIT) buff_size = BUFF_LIMIT+1;"), where BUFF_LIMIT might happen to be MAX_INT). I guess one could use #if directives with INT_MAX etc. to disable inapplicable code. I'm curious about another scenario, though; are compilers required to distinguish between specified constants and values they know with certainty to be constant? – supercat Jul 22 '11 at 15:17
  • @Steven Jessop: Consider the expression "16/(SOME_CONSTANT+(some_unsigned_char >> 8))", where SOME_CONSTANT is a #define'ed integer literal. A compiler might recognize that "some_unsigned_char >> 8" is always zero; if FOO was e.g. 2, the compiler could substitute a constant 8 for the whole expression. Suppose that FOO was zero, though; would the program be well-formed or ill-formed? – supercat Jul 22 '11 at 15:20
  • @supercat: if `SOME_CONSTANT` is 0, and `sizeof(unsigned char)` is 8, and `some_unsigned_char` is an integer constant expression, then the program is ill-formed. There's no "a compiler might recognize" -- compilers are *required* to evaluate integer constant expressions at compile-time, because they can be used for example as the size of an array, and the compiler is required to reject arrays of negative size. If `some_unsigned_char` isn't an integer constant expression then the program is well-formed but of course a compiler is entitled to warn that it has UB. – Steve Jessop Jul 24 '11 at 12:54
  • I guess the principle here is that taking an expression that isn't an ICE, and doing something to it that has the same result regardless of the actual value, doesn't turn it into an ICE. The fact that the compiler just so happens to be smart enough to figure out that the result is independent of the value *also* doesn't make it an ICE, because that's defined in the standard (5.19), it's not "anything your compiler can compute". It's still not a constant expression, and so 5/5 doesn't make the program containing it ill-formed just because the final result of the division is undefined. – Steve Jessop Jul 24 '11 at 13:00