6

1. Why?

Code like this used to work and it's kind of obvious what it is supposed to mean. Is the compiler even allowed (by the specification) to make it an error?

I know that it's loosing precision and I would be happy with a warning. But it still has a well-defined semantics (at least for unsigned downsizing cast is defined) and the user just might want to do it.

2. Workaround

I have legacy code that I don't want to refactor too much because it's rather tricky and already debugged. It is doing two things:

  1. Sometimes stores integers in pointer variables. The code only casts the pointer to integer if it stored an integer in it before. Therefore while the cast is downsizing, the overflow never happens in reality. The code is tested and works.

    When integer is stored, it always fits in plain old unsigned, so changing the type is not considered a good idea and the pointer is passed around quite a bit, so changing it's type would be somewhat invasive.

  2. Uses the address as hash value. A rather common thing to do. The hash table is not that large to make any sense to extend the type.

    The code uses plain unsigned for hash value, but note that the more usual type of size_t may still generate the error, because there is no guarantee that sizeof(size_t) >= sizeof(void *). On platforms with segmented memory and far pointers, size_t only has to cover the offset part.

So what are the least invasive suitable workarounds? The code is known to work when compiled with compiler that does not produce this error, so I really want to do the operation, not change it.


Notes:

void *x;
int y;
union U { void *p; int i; } u;
  1. *(int*)&x and u.p = x, u.i are not equivalent to (int)x and are not the opposite of (void *)y. On big endian architectures, the first two will return the bytes on lower addresses while the later will work on low order bytes, which may reside on higher addresses.
  2. *(int*)&x and u.p = x, u.i are both strict aliasing violations, (int)x is not.
Jan Hudec
  • 65,662
  • 12
  • 114
  • 158
  • Of course the compiler is allowed to make it an error, even required. The standard describes implicit conversions between many types, but integerspointers are not among them. –  Feb 05 '14 at 10:24
  • Why not turn your pointer into a union with a pointer and an `unsigned`? I know you don't want to touch your legacy code if you don't need to, but IMHO that's the clean way to fix your issue. – DarkDust Feb 05 '14 at 10:26
  • 1
    There are discussions at http://stackoverflow.com/questions/2024895/how-should-i-handle-cast-from-void-to-int-loses-precision-when-compiling which include suggested syntax. – borrible Feb 05 '14 at 10:26
  • 2
    Is "loses precision" really the error you get? Which compiler? I get the more appropriate error: "invalid conversion from `int*` to `int`" – Joseph Mansfield Feb 05 '14 at 10:27
  • You may use `std::uintptr_t` – Jarod42 Feb 05 '14 at 10:27
  • 1
    @DarkDust: A union still invokes UB, IIRC, if you read from a member other than the one you last wrote (except for any layout-compatible initial members). – cHao Feb 05 '14 at 10:28
  • @cHao It does, but several major compilers document this as a supported way of doing type punning, so in practice it may be acceptable. –  Feb 05 '14 at 10:31
  • 1
    @cHao, this is only true for C++. In C the behavior is well defined as long the value isn't a trap or something like that. In fact, `union` was invented for such things. – Jens Gustedt Feb 05 '14 at 10:33
  • 3
    Don't tag such a thing with "language-lawyer" and the same time with C and C++. These are different languages and their concepts on these things differ. – Jens Gustedt Feb 05 '14 at 10:34
  • 1
    @JensGustedt: I did that pretty much intentionally. I am looking for standards reference and I do want to know if there is any difference between the two (this is a part where C++ includes C, but might have diverted in details). – Jan Hudec Feb 05 '14 at 10:40
  • After learning how this is solved in the C/C++ programming language, I'd be very interested how to fix it in the Java/assembly and Haskell/Malbolge languages. – unwind Feb 05 '14 at 10:40
  • "overflow never happens in reality" - on 64-bit OSes it **will** overflow and may cause many things worse than just overflow – phuclv Feb 05 '14 at 12:15
  • if you really don't want to change the code, the only way to do is compile the project as 32-bit, or if not possible, compile the legacy code as a seperate 32-bit process and use inter-process communication – phuclv Feb 05 '14 at 12:17
  • @LưuVĩnhPhúc: It's a base class used for everything and then some more. No, I obviously have to change it. I just want to change it so that there is minimal risk of introducing an error. – Jan Hudec Feb 05 '14 at 12:24
  • @LưuVĩnhPhúc: No, overflow __won't happen__. It won't happen because the code will never cast it if the numeric value of the pointer is larger than what fits in the integral type. The code uses pointer to store integer, not the other way around. – Jan Hudec Feb 05 '14 at 12:26
  • @LưuVĩnhPhúc: Oh, and I never said anything about 32 or 64 bits. For what it's worth it may be a 16 bit system with 32 bit far pointers. – Jan Hudec Feb 05 '14 at 12:27
  • When you say "only casts a pointers to an integer if it stored an integer in it before", are you implying that code casts arbitrary integers to pointers? The standard guarantees that if a particular number resulted from casting a pointer to an integer, casting that integer back to a pointer will yield the same number, but casting any value which wasn't received from a pointer-to-integer cast is Undefined Behavior. It sounds like what you need for correctness is probably a union. – supercat Feb 10 '14 at 19:52
  • @supercat: Union would have been correct, though the code is actually a piece of shit and shouldn't be needing anything like this at all. But on the other hand I've seen _lot_ of code where callbacks have generic `void *` argument and if the caller needs to pass just a number, simply casts the integer to `void *` and back to integer in the callback. – Jan Hudec Feb 10 '14 at 19:56
  • @JanHudec: I don't doubt that such things are done, but from a standards-compliance standpoint code which casts directly from a pointer to an undersized integer and expects to do something useful with the result cannot be standards-compliant. If such code is using a non-standard extension (as seems typical), requiring an intermediate cast shouldn't pose an undue burden. – supercat Feb 10 '14 at 20:04
  • @supercat: Ah, of course you are right. There even is (was, 20 years ago) platform where it would break. On 8086 and 80286 in 16-bit mode far pointers had multiple representations for each address. And in huge memory model, they were normalized. So if you casted `1000l` to pointer and back, you'd get `4063240l` (and if you casted to `int` just `8`) back (`0:0x3e8` would be normalized to `0x3e:0x8` and read back as `0x3e0008l`). – Jan Hudec Feb 10 '14 at 23:44

3 Answers3

8

C++, 5.2.10:

4 - A pointer can be explicitly converted to any integral type large enough to hold it. [...]

C, 6.3.2.3:

6 - Any pointer type may be converted to an integer type. [...] If the result cannot be represented in the integer type, the behavior is undefined. [...]

So (int) p is illegal if int is 32-bit and void * is 64-bit; a C++ compiler is correct to give you an error, while a C compiler may either give an error on translation or emit a program with undefined behaviour.

You should write, adding a single conversion:

(int) (intptr_t) p

or, using C++ syntax,

static_cast<int>(reinterpret_cast<intptr_t>(p))

If you're converting to an unsigned integer type, convert via uintptr_t instead of intptr_t.

ecatmur
  • 137,771
  • 23
  • 263
  • 343
  • 1
    5.2.10 in C or C++ specification? – Jan Hudec Feb 05 '14 at 10:41
  • @JanHudec C++11. I'll check the C specification. – ecatmur Feb 05 '14 at 10:41
  • Casting a wide signed integer type to a narrow one is not defined behavior, one should definitively use `uintptr_t` and `unsigned`. Then just using the low order bits is not appropriate either, I think. Please see my answer. – Jens Gustedt Feb 05 '14 at 10:46
  • @JensGustedt it's implementation-defined (4.7p3) and unchanged if it can be represented in the range of the destination type. It seems this would be adequate in this case, if the user is happy with the behaviour on a compiler that fails to diagnose the error. – ecatmur Feb 05 '14 at 10:51
  • @JensGustedt Converting a wide signed integer type to a narrower one is perfectly well defined behavior if the value being converted can be represented in the narrower type. – James Kanze Feb 05 '14 at 10:56
  • @JamesKanze, good chances are that it can't. There is no reason to suppose that on 64 bit machine with randomized virtual addresses the range of pointers is limited to the lower 32 bits. – Jens Gustedt Feb 05 '14 at 11:08
4

This is a tough one to solve "generically", because the "looses precision" indicates that your pointers are larger than the type you are trying to store it in. Which may well be "ok" in your mind, but the compiler is concerned that you will be restoring the int value back into a pointer, which has now lost the upper 32 bits (assuming we're talking 32-bit int and 64-bit pointers - there are other possible combinations).

There is uintptr_t that is size-compatible with whatever the pointer is on the systems, so typically, you can overcome the actual error by:

int x = static_cast<int>(reinterpret_cast<uintptr_t>(some_ptr));

This will first force a large integer from a pointer, and then cast the large integer to a smaller type.

Mats Petersson
  • 119,687
  • 13
  • 121
  • 204
3

Answer for C

Converting pointers to integers is implementation defined. Your problem is that the code that you are talking about seems never have been correct. And probably only worked on ancient architectures where both int and pointers are 32 bit.

The only types that are supposed to convert without loss are [u]intptr_t, if they exist on the platform (usually they do). Which part of such an uintptr_t is appropriate to use for your hash function is difficult to tell, you shouldn't make any assumptions on that. I would go for something like

uintptr_t n = (uintptr_t)x;

and then

((n >> 32) ^ n) & UINT32_MAX

this can be optimized out on 32 bit archs, and would give you traces of all other bits on 64 bit archs.

For C++ basically the same should apply, just the cast would be reinterpret_cast<std:uintptr_t>(x).

Jens Gustedt
  • 72,200
  • 3
  • 92
  • 164