How to convert to integer a char[4] of "hexadecimal" numbers [C/Linux]

Question

So I'm working with system calls in Linux. I'm using "lseek" to navigate through the file and "read" to read. I'm also using Midnight Commander to see the file in hexadecimal. The next 4 bytes I have to read are in little-endian , and look like this : "2A 00 00 00". But of course, the bytes can be something like "2A 5F B3 00". I have to convert those bytes to an integer. How do I approach this? My initial thought was to read them into a vector of 4 chars, and then to build my integer from there, but I don't know how. Any ideas?

Let me give you an example of what I've tried. I have the following bytes in file "44 00". I have to convert that into the value 68 (4 + 4*16):

char value[2];
read(fd, value, 2);
int i = (value[0] << 8) | value[1];

The variable i is 17480 insead of 68.

UPDATE: Nvm. I solved it. I mixed the indexes when I shift. It shoud've been value[1] << 8 ... | value[0]

I would use bit-shifting and bit-wise OR to combine the bytes to an `int`. For litte-endian, the first byte is the least significant one. This approach is independent from the endianness of your system. — Bodo, Mar 27 '20 at 12:20
Which part are you asking about -- reading the bytes, or converting them to an integer? What information do you have about the on-file integer representation (i.e. do you *know* that it is 32-bit little-endian)? What do you know or are you willing to assume about the host machine's native endianness? Do you want a signed or unsigned result? — John Bollinger, Mar 27 '20 at 12:39
@JohnBollinger Reading is not a problem. Converting them is. Sadly I don't know many details. My assignment is to read from a file and do stuff with the bytes read. I only know that there are chunks of 4 bytes aka word. Each word in the file is represented in little-endian. And the words are all positive numbers. — Oros Tom, Mar 27 '20 at 12:46
@Bodo that's what I'm struggling with. With what amount do i shift every byte? — Oros Tom, Mar 27 '20 at 12:47
@OrosTom Please [edit] your question and add all information there. I suggest to add some code to show how you attempt to solve your problem. Replace the parts you don't know with comments. This will help us to answer your question. — Bodo, Mar 27 '20 at 12:49

Paul Ogilvie · Answer 1 · 2020-03-27T13:10:47.403

0

Suppose you point into your buffer:

unsigned char *p = &buf[20];

and you want to see the next 4 bytes as an integer and assign them to your integer, then you can cast it:

int i;
i = *(int *)p;

You just said that p is now a pointer to an int, you de-referenced that pointer and assigned it to i.

However, this depends on the endianness of your platform. If your platform has a different endianness, you may first have to reverse-copy the bytes to a small buffer and then use this technique. For example:

unsigned char ibuf[4];
for (i=3; i>=0; i--) ibuf[i]= *p++;
i = *(int *)ibuf;

EDIT

The suggestions and comments of Andrew Henle and Bodo could give:

unsigned char *p = &buf[20];
int i, j;

unsigned char *pi= &(unsigned char)i;
for (j=3; j>=0; j--) *pi++= *p++;

// and the other endian:
int i, j;
unsigned char *pi= (&(unsigned char)i)+3;
for (j=3; j>=0; j--) *pi--= *p++;

edited Mar 27 '20 at 13:10

answered Mar 27 '20 at 12:12

Paul Ogilvie

24,146
4
18
39

2

Not my DV, but `i = *(int *)p;` is a strict-aliasing violation and can violate any alignment restrictions, even on x86 systems - see https://stackoverflow.com/questions/46790550/c-undefined-behavior-strict-aliasing-rule-or-incorrect-alignment – Andrew Henle Mar 27 '20 at 12:28
To fix the possible alignment problem you should cast the other way round, i.e. cast the address of an `int` variable or array to an `unsigned char *` and copy the data there. Depending on the platform, unaligned access may be be slower or may trigger a trap or fault which will can lead to a stop or reset of the system. (For example on an ARM based Linux system you can change the behavior at run time.) – Bodo Mar 27 '20 at 12:49

John Bollinger · Accepted Answer · 2020-03-28T04:32:13.593

General considerations

There seem to be several pieces to the question -- at least how to read the data, what data type to use to hold the intermediate result, and how to perform the conversion. If indeed you are assuming that the on-file representation consists of the bytes of a 32-bit integer in little-endian order, with all bits significant, then I probably would not use a char[] as the intermediate, but rather a uint32_t or an int32_t. If you know or assume that the endianness of the data is the same as the machine's native endianness, then you don't need any other.

Determining native endianness

If you need to compute the host machine's native endianness, then this will do it:

static const uint32_t test = 1;
_Bool host_is_little_endian = *(char *)&test;

It is worthwhile doing that, because it may well be the case that you don't need to do any conversion at all.

Reading the data

I would read the data into a uint32_t (or possibly an int32_t), not into a char array. Possibly I would read it into an array of uint8_t.

uint32_t data;
int num_read = fread(&data, 4, 1, my_file);
if (num_read != 1) { /* ... handle error ... */ }

Converting the data

It is worthwhile knowing whether the on-file representation matches the host's endianness, because if it does, you don't need to do any transformation (that is, you're done at this point in that case). If you do need to swap endianness, however, then you can use ntohl() or htonl():

if (!host_is_little_endian) {
    data = ntohl(data);
}

(This assumes that little- and big-endian are the only host byte orders you need to be concerned with. Historically, there have been others, which is why the byte-reorder functions come in pairs, but you are extremely unlikely ever to see one of the others.)

Signed integers

If you need a signed instead of unsigned integer, then you can do the same, but use a union:

union {
    uint32_t unsigned;
    int32_t signed;
} data;

In all of the preceding, use data.unsigned in place of plain data, and at the end, read out the signed result from data.signed.

Very complete answer, John. – Paul Ogilvie Mar 27 '20 at 13:14 — Paul Ogilvie, Mar 27 '20 at 13:14