24

I recently noticed a (weird) behavior when I conducted operations using shift >> <<!

To explain it, let me write this small runnable code that does two operations which are supposed to be identical(In my understanding), but I'm surprised with different results!

#include <stdio.h>

int main(void) {
    unsigned char a=0x05, b=0x05;

    // first operation
    a = ((a<<7)>>7);

    // second operation
    b <<= 7;
    b >>= 7;

    printf("a=%X b=%X\n", a, b);
    return 0;
} 

When ran, a = 5 and b = 1. I expect them both to be equal to 1! Can someone kindly explain why I got such a result?

P.S: In my environment the size of unsigned char is 1 byte

legends2k
  • 27,643
  • 22
  • 108
  • 196
chouaib
  • 2,714
  • 5
  • 18
  • 33
  • 7
    Read about [numeric promotion](http://en.cppreference.com/w/cpp/language/implicit_cast#Numeric_promotions). – Some programmer dude Sep 23 '14 at 07:32
  • So `a = ((a<<31)>>31);` will get what I want, right? (int size is 4 bytes) – chouaib Sep 23 '14 at 07:46
  • @JoachimPileborg: why in the case of `a` would the compiler not just optimize the operation out, leaving `a` unchanged? – David C. Rankin Sep 23 '14 at 08:18
  • @DavidC.Rankin Of course there's nothing to stop the compiler from doing that, the only way to be sure is to look at the generated assembly code. But in the *general* case, the result is what it is because of numeric promotion. – Some programmer dude Sep 23 '14 at 08:34
  • 2
    `a & 1` will produce the same result as your second operation (and makes more sense). – Sean Latham Sep 23 '14 at 09:13
  • @chouaib `a = ((a<<31)>>31);` will result in undefined behavior, because `unsigned char` is converted to `int` by an integer promotion and a signed overflow on a left-shift is undefined behavior. – Virgile Sep 23 '14 at 09:32
  • @ipi: dude if you read the question, it discusses the behavior of shifting , I didn't mention (how to mask some bits?) – chouaib Sep 23 '14 at 12:29

3 Answers3

31

In the first example:

  • a is converted to an int, shifted left, then right and then converted back to usigned char.

This will result to a=5 obviously.

In the second example:

  • b is converted to int, shifted left, then converted back to unsigned char.
  • b is converted to int, shifted right, then converted back to unsigned char.

The difference is that you lose information in the second example during the conversion to unsigned char

thumbmunkeys
  • 20,144
  • 8
  • 56
  • 107
  • I see! that is a bit confusing when writing long codes, Now suppose that I aim to lose that information, how can I do it in one line ? is it fine to cast to char `a = ((char)(a<<7)>>7)` ? – chouaib Sep 23 '14 at 07:37
  • I guess I got it, one line code will be `a = ((a<<31)>>31);` Right ? well, supposing sizeof(int) = 4 bytes – chouaib Sep 23 '14 at 07:42
  • 5
    `a = ((char)a<<7)>>7;` will do it (shifts, truncates then shifts back)(watch your brackets). Though far easier to just do `a = a & 0x01;` – Baldrickk Sep 23 '14 at 07:44
  • @chouaib sorry, misread your question. If you want to lose information, then yes, you can cast to `char`. You should get the same result as for your second example then. Also beware of the parenthesis as Baldrickk mentioned – thumbmunkeys Sep 23 '14 at 07:45
  • 2
    @Baldrickk @thumbmunkeys: casting to `char` will ruin it giving all `0xFF` so I guess it's rather suitable to cast to `unsigned char`. – chouaib Sep 23 '14 at 07:56
  • 2
    @chouaib still better to mask it with `a = a & 0x01;` Dot the operation you want to do when available, not exploiting 'quirks' of the system, unless for some reason you want to intentionally obfuscate your code. – Baldrickk Sep 23 '14 at 08:06
  • Yes @Baldrickk I used to `&` when trying to use some bits, today particularly I didn't have the choice of logic operators that's why I encountered this case and felt like it's worth to ask on SO – chouaib Sep 23 '14 at 08:09
  • @chouaib `a = ((unsigned char)(a<<7)>>7)` is what you want. Using `char` causes implementation-defined behaviour for some values of `a`; and `a<<31` causes undefined behaviour. – M.M Sep 24 '14 at 05:25
17

Detailed explanation of the things going on between the lines:

Case a:

  • In the expression a = ((a<<7)>>7);, a<<7 is evaluated first.
  • The C standard states that each operand of the shift operators is implicitly integer promoted, meaning that if they are of types bool, char, short etc (collectively the "small integer types"), they get promoted to an int.
  • This is standard practice for almost every operator in C. What makes the shift operators different from other operators is that they don't use the other kind of common, implicit promotion called "balancing". Instead, the result of a shift always have the type of the promoted left operand. In this case int.
  • So a gets promoted to type int, still containing the value 0x05. The 7 literal was already of type int so it doesn't get promoted.
  • When you left shift this int by 7, you get 0x0280. The result of the operation is of type int.
  • Note that int is a signed type, so had you kept shifting data further, into the sign bits, you would have invoked undefined behavior. Similarly, had either the left or the right operand been a negative value, you would also invoke undefined behavior.
  • You now have the expression a = 0x280 >> 7;. No promotions take place for the next shift operation, since both operands are already int.
  • The result is 5 and of the type int. You then convert this int to an unsigned char, which is fine, since the result is small enough to fit.

Case b:

  • b <<= 7; is equivalent to b = b << 7;.
  • As before, b gets promoted to an int. The result will again be 0x0280.
  • You then attempt to store this result in an unsigned char. It will not fit, so it will get truncated to only contain the least significant byte 0x80.
  • On the next line, b again gets promoted to an int, containing 0x80.
  • And then you shift 0x80 by 7, getting the result 1. This is of type int, but can fit in an unsigned char, so it will fit in b.

Good advice:

  • Never ever use bit-wise operators on signed integer types. This doesn't make any sense in 99% of the cases but can lead to various bugs and poorly defined behavior.
  • When using bit-wise operators, use the types in stdint.h rather than the primitive default types in C.
  • When using bit-wise operators, use explicit casts to the intended type, to prevent bugs and unintended type changes, but also to make it clear that you actually understand how implicit type promotions work, and that you didn't just get the code working by accident.

A better, safer way to write your program would have been:

#include <stdio.h>
#include <stdint.h>    

int main(void) {
    uint8_t a=0x05;
    uint8_t b=0x05;
    uint32_t tmp;

    // first operation
    tmp = (uint32_t)a << 7;
    tmp = tmp >> 7;
    a = (uint8_t)tmp;

    // second operation
    tmp = (uint32_t)b << 7;
    tmp = tmp >> 7;
    b = (uint8_t)tmp;

    printf("a=%X b=%X\n", a, b);
    return 0;
} 
Lundin
  • 155,020
  • 33
  • 213
  • 341
  • Thank you very much for the detailed explanation and the huge quantity of information and tips, I am just a bit disappointed to not being able to do that within one-line-code (Operation 1) – chouaib Sep 23 '14 at 12:36
  • @chouaib Writing everything in one line of code fills no purpose of its own. I split it into several lines only for the sake of readability - the generated machine code will still be the same. You could as well write `a = (uint8_t)(((uint32_t)a<<7)>>7);` but that's an unreadable mess. – Lundin Sep 23 '14 at 12:39
  • Selecting `uint32_t` appears arbitrary. There is nothing special about 32-bit width or is this just an example where `uint162_t` or `uint64_t` would work as well? Casting to `unsigned` or `uintmax_t` would have some relevance though. – chux - Reinstate Monica Sep 23 '14 at 13:45
  • @chux There is something special with uint32_t, namely that it is not one of the small integer types, on any known system in the world. I picked it to ensure that this example code will work fine on all systems. Had I picked for example `uint16_t`, it would have worked fine on 8 and 16 bit systems, but on 32 bit system I would still have gotten an integer promotion. But instead of uint32_t, you can use any other "large enough" unsigned integer type, such as `uint_least32_t` or `uint64_t`. – Lundin Sep 23 '14 at 14:04
  • @chux Casting to `unsigned` is not a good idea, as it has unknown size and therefore the code turns non-portable. Casting to `uintmax_t` doesn't make any sense either, as it will likely yield an unnecessarily large type. – Lundin Sep 23 '14 at 14:05
  • [This answer](http://stackoverflow.com/a/2331768/2410359) also states a specialty of 32-bit. That and your premise rely on `int/unsigned` never being more than 32-bits to be portable. That is at least a reasonable assertion, though one not guaranteed by C spec. How to make best make portable depends on the required shift range: fixed ranges like 0 to 15 or 0 to 31, 0 to `unsigned` bit width -1, etc. - something not stated by OP. – chux - Reinstate Monica Sep 23 '14 at 14:58
  • One of my major peeves with C is that there's no way to specify an unsigned type that should *not* be subject to promotion. I wish it defined separate e.g. `unum16_t` and `uwrap16_t`, with the semantics that computations involving the latter would be performed modulo 65,536 regardless of the size of `int` [an implementation or future language standard could offer such types if they were defined as an extension which was exempt from the existing rules]. Such a feature would make it *much* easier to ensure portability with any compiler that supports it. – supercat Sep 23 '14 at 16:40
14

The shift operations would do integer promotions to its operands, and in your code the resulting int is converted back to char like this:

// first operation
a = ((a<<7)>>7); // a = (char)((a<<7)>>7);

// second operation
b <<= 7; // b = (char) (b << 7);
b >>= 7; // b = (char) (b >> 7);

Quote from the N1570 draft (which became the standard of C11 later):

6.5.7 Bitwise shift operators:

  1. Each of the operands shall have integer type.
  2. The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

And it's supposed that in C99 and C90 there are similar statements.

Robert Harvey
  • 168,684
  • 43
  • 314
  • 475
starrify
  • 13,101
  • 4
  • 32
  • 48
  • 3
    Char **is** an integer type. – Oliver Charlesworth Sep 23 '14 at 08:14
  • @OliverCharlesworth Thank you and it was my problem not expressing the idea correctly. I've had the answer edited. – starrify Sep 23 '14 at 08:39
  • Very hard to accept one answer since all of them are "Acceptable", I upVoted the 3 of them, but I decided to accept starrify's due to the addition of the quote, but to be honest I wish I can accept more than one ;) – chouaib Oct 20 '14 at 00:00