2

Hi I got a small conceptual problem regarding bitoperations. See the below code where I have a 4byte unsigned int. Then I access the individual bytes by assigning the address's to unsigned chars.

I then set the value of the last byte to one. And perform a shift right on the unsigned int(the 4byte variable). I do not understand why this operation apparantly changes the content of the 3byte.

See code below along with the output when I run it

#include <cstdio>

int main(int argc,char **argv){
  fprintf(stderr,"sizeof(unsigned int): %lu sizeof(unsigned char):%lu\n",sizeof(unsigned int),sizeof(unsigned char));
  unsigned int val=0;
  unsigned char *valc =(unsigned char*) &val;
  valc[3] = 1;
  fprintf(stderr,"uint: %u, uchars: %u %u %u %u\n",val,valc[0],valc[1],valc[2],valc[3]);
  val = val >>1;
  fprintf(stderr,"uint: %u, uchars: %u %u %u %u\n",val,valc[0],valc[1],valc[2],valc[3]);
  return 0;
}


sizeof(unsigned int): 4 sizeof(unsigned char):1
uint: 16777216, uchars: 0 0 0 1
uint: 8388608, uchars: 0 0 128 0

Thanks in advance

NathanOliver
  • 150,499
  • 26
  • 240
  • 331
monkeyking
  • 6,118
  • 20
  • 56
  • 76
  • 1
    [Are you sure that's not the first byte](https://en.wikipedia.org/wiki/Endianness)? – user4581301 Feb 21 '20 at 22:53
  • 1
    Your first clue will be to pontificate on the reason why the first number comes out to "16777216". Then try setting the other three bytes to `1`, one at a time see what the result is, and you should be able to figure out the reason by yourself. – Sam Varshavchik Feb 21 '20 at 22:57
  • Note, though, that the [correct format specifier for `size_t`](https://stackoverflow.com/questions/2125845/platform-independent-size-t-format-specifiers-in-c) (which is what `sizeof()` returns) is `"%zu"`, not `"%lu"`. Using an incorrect format specifier invokes undefined behavior. – Andrew Henle Feb 21 '20 at 23:31

2 Answers2

4

You've discovered that your computer doesn't always store the bytes for multi-byte data types in the order you happen to expect. valc[0] is the least significant byte (LSB) on your system. Since the LSB is stored at the lowest memory address, it is known as a "little-endian" system. At the other end, valc[3] is the most significant byte (MSB).

Your output will make more sense to you if you print valc[3],valc[2],valc[1],valc[0] instead, since humans expect the most significant values to be on the left.

Other computer architectures are "big-endian" and will store the most significant byte first.

This article also explains this concept in way more detail: https://en.wikipedia.org/wiki/Endianness

The book "The Practice of Programming" by Brian Kernighan and Rob Pike also contains some good coverage on byte order (Section 8.6 Byte Order) and how to write portable programs that work on both big-endian and little-endian systems.

remcycles
  • 732
  • 5
  • 9
2

If we change the output of the int to hex (i.e. change %u to %x), what happens becomes more apparent:

uint: 1000000, uchars: 0 0 0 1
uint: 800000, uchars: 0 0 128 0

The value of val is shifted right by 1. This results in the low bit of the highest order byte getting shifted into the high bit of the next byte.

dbush
  • 162,826
  • 18
  • 167
  • 209
  • Ok, so maybe my misunderstanding was that shiftright does not actually mean that the bit are actually shifted to the right but are shifted towards the least significant bit direction? – monkeyking Feb 21 '20 at 23:05
  • 2
    Yes, when you shift right, everything is shifted towards the least significant bit. – remcycles Feb 21 '20 at 23:07