39

I have the following code:

unsigned char x = 255;
printf("%x\n", x); // ff

unsigned char tmp = x << 7;
unsigned char y = tmp >> 7;
printf("%x\n", y); // 1

unsigned char z = (x << 7) >> 7;
printf("%x\n", z); // ff

I would have expected y and z to be the same. But they differ depending on whether a intermediary variable is used. It would be interesting to know why this is the case.

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
odzhychko
  • 503
  • 3
  • 7
  • `(x<<7)>>7` in principle also stores an intermediate result. But I don't know where it says what the type of this intermediate result should be. – The Photon May 22 '20 at 15:52
  • 8
    @ThePhoton: it says in the C Standard that the intermediary type used for evaluating `(x << 7) >> 7` is `int` or `unsigned int` depending on the sizes of `unsigned char` and `int`. – chqrlie May 22 '20 at 17:31

3 Answers3

27

This little test is actually more subtle than it looks as the behavior is implementation defined:

  • unsigned char x = 255; no ambiguity here, x is an unsigned char with value 255, type unsigned char is guaranteed to have enough range to store 255.

  • printf("%x\n", x); This produces ff on standard output but it would be cleaner to write printf("%hhx\n", x); as printf expects an unsigned int for conversion %x, which x is not. Passing x might actually pass an int or an unsigned int argument.

  • unsigned char tmp = x << 7; To evaluate the expression x << 7, x being an unsigned char first undergoes the integer promotions defined in the C Standard 6.3.3.1: If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.

    So if the number of value bits in unsigned char is smaller or equal to that of int (the most common case currently being 8 vs 31), x is first promoted to an int with the same value, which is then shifted left by 7 positions. The result, 0x7f80, is guaranteed to fit in the int type, so the behavior is well defined and converting this value to type unsigned char will effectively truncate the high order bits of the value. If type unsigned char has 8 bits, the value will be 128 (0x80), but if type unsigned char has more bits, the value in tmp can be 0x180, 0x380, 0x780, 0xf80, 0x1f80, 0x3f80 or even 0x7f80.

    If type unsigned char is larger than int, which can occur on rare systems where sizeof(int) == 1, x is promoted to unsigned int and the left shift is performed on this type. The value is 0x7f80U, which is guaranteed to fit in type unsigned int and storing that to tmp does not actually lose any information since type unsigned char has the same size as unsigned int. So tmp would have the value 0x7f80 in this case.

  • unsigned char y = tmp >> 7; The evaluation proceeds the same as above, tmp is promoted to int or unsigned int depending on the system, which preserves its value, and this value is shifted right by 7 positions, which is fully defined because 7 is less than the width of the type (int or unsigned int) and the value is positive. Depending on the number of bits of type unsigned char, the value stored in y can be 1, 3, 7, 15, 31, 63, 127 or 255, the most common architecture will have y == 1.

  • printf("%x\n", y); again, it would be better t write printf("%hhx\n", y); and the output may be 1 (most common case) or 3, 7, f, 1f, 3f, 7f or ff depending on the number of value bits in type unsigned char.

  • unsigned char z = (x << 7) >> 7; The integer promotion is performed on x as described above, the value (255) is then shifted left 7 bits as an int or an unsigned int, always producing 0x7f80 and then right shifted by 7 positions, with a final value of 0xff. This behavior is fully defined.

  • printf("%x\n", z); Once more, the format string should be printf("%hhx\n", z); and the output would always be ff.

Systems where bytes have more than 8 bits are becoming rare these days, but some embedded processors, such as specialized DSPs still do that. It would take a perverse system to fail when passed an unsigned char for a %x conversion specifier, but it is cleaner to either use %hhx or more portably write printf("%x\n", (unsigned)z);

Shifting by 8 instead of 7 in this example would be even more contrived. It would have undefined behavior on systems with 16-bit int and 8-bit char.

chqrlie
  • 98,886
  • 10
  • 89
  • 149
  • I'm prepared to argue that failing when passing the unsigned char to printf is out-of-spec. – Joshua May 23 '20 at 03:46
  • You say that `unsigned char` can be *larger* than `int` on systems with `sizeof(int)==1`. By definition they would have the same `sizeof()` in that case, so it's potentially misleading to say "larger". It's possible that `unsigned char` could have more value bits than `int` (`int` can have padding; `unsigned char` isn't allowed to). But even without any of that, the high end of the value-range of `unsigned char` can be larger than for `int` for the same number of value bits, simply because it's unsigned. – Peter Cordes May 23 '20 at 05:55
  • I also find it strange to say they're "equal" if the upper limits of value-range match between `unsigned char` and `signed int` (thus allowing unsigned char to promote to int). They can't be the same type (they must differ in signedness), and having the same upper limit of value range (positive end) would mean that `int` has 1 more value bit. – Peter Cordes May 23 '20 at 05:56
  • @PeterCordes: yes indeed, I hesitated over the term *equal*... Of course only the numbers of value bits are equal and `int` would need at least one extra bit for the sign and at least 15 padding bits on this uncanny architecture. I wonder if anything else in the Standard would prevent conformity in this case. My DS9K compiler is not ripe enough to test this :( – chqrlie May 23 '20 at 11:26
  • @chqrlie: The sign bit is part of the value in the object representation, not the padding. I was trying to use the terminology from the ISO C standard. – Peter Cordes May 23 '20 at 15:56
  • 1
    @PeterCordes: The sign bit is not part of the *value bits*, as used in **C17 6.2.6.2**: *[...] For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit.[...]*. So technically, `int` and `unsigned char` can have the same number of *value bits*, but then it must have a separate sign bit, and hence at least `CHAR_BIT-1` padding bits on such a weird architecture. – chqrlie May 23 '20 at 16:59
  • 1
    Ah, my mistake, thanks for correcting me on how C uses the term "value bits". Giving the example of 8 vs. 31 is very helpful to make it clear that it's not including the sign bit in case anyone else forgot. Good edit. – Peter Cordes May 23 '20 at 21:36
12

The 'intermediate' values in your last case are (full) integers, so the bits that are shifted 'out of range' of the original unsigned char type are retained, and thus they are still set when the result is converted back to a single byte.

From this C11 Draft Standard:

6.5.7 Bitwise shift operators
...
3 The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand ...

However, in your first case, unsigned char tmp = x << 7;, the tmp loses the six 'high' bits when the resultant 'full' integer is converted (i.e. truncated) back to a single byte, giving a value of 0x80; when this is then right-shifted in unsigned char y = tmp >> 7;, the result is (as expected) 0x01.

Adrian Mole
  • 30,672
  • 69
  • 32
  • 52
  • Excellent! Now, is the integer promotion to `unsigned int` since the original type is `unsigned char`? Otherwise, I might expect to see a sign extension in the right shift. – Fred Larson May 22 '20 at 16:10
  • @FredLarson It doesn't matter if the promoted type is signed or unsigned! As the value `255` can be **properly represented** by either, sign-extension does not occur. That is, even if you explicitly cast an `unsigned char` value of `255` to a *signed* 32-bit `int`, its value will be `255` (not `INT_MIN`). – Adrian Mole May 22 '20 at 16:13
  • 5
    @FredLarson You definitely wouldn't see sign-extension with an unsigned type. As for what it promotes to, it promotes to an `int` (assuming an `int` is larger than a `char` on said system) per C11 draft standard section 6.3.1.1: "*If an **int** can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an **int**; otherwise, it is converted to an **unsigned int**.* – Christian Gibbons May 22 '20 at 16:19
7

The shift operator is not defined for the char types. The value of any char operand is converted to int and the result of the expression is converted the char type. So, when you put the left and right shift operators in the same expression the calculation will be performed as type int (without loosing any bit), and the result will be converted to char.

ivangreek
  • 86
  • 3