7

I wonder if C++ implementations are allowed to represent pointers to different types differently. For instance, if we had 4-byte sized/aligned int and 8-byte sized/aligned long, would it be possible to represent pointers-to-int/long as object addresses shifted right by 2/3 bits, respectively? This would effectively forbid to convert a pointer-to-long into a pointer-to-int.

I am asking because of [expr.reinterpret.cast/7]:

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)).

[Note 7: Converting a pointer of type “pointer to T1” that points to an object of type T1 to the type “pointer to T2” (where T2 is an object type and the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. — end note]

The first sentence suggests that we can convert pointers to any two object types. However, the empathized text in the (not normative) Note 7 then says that the alignment plays some role here as well. (That's why I came up with that int-long example above.)

Daniel Langr
  • 18,256
  • 1
  • 39
  • 74
  • @JHBonarius But what forces a C++ implementation to use the same representation? In my example, an implementation would represent a pointer-to-`int` as an address (your "machine memory model pointer") shifted right by two bits. If it then shifted it back before the pointer would be passed to some memory-related machine code instruction, there would be no technical problem. (Of course, no sane implementation would likely do this, but this is a language-lawyer question about such a possibility.) – Daniel Langr Feb 08 '21 at 13:11
  • _if we had 4-byte sized/aligned int and 8-byte sized/aligned long, would it be possible to represent pointers-to-int/long as object addresses shifted right by 2/3 bits, respectively? This would effectively forbid to convert a pointer-to-long into a pointer-to-int_ Could you explain how? – Language Lawyer Feb 08 '21 at 19:43
  • _Can pointers to different types have different binary representations?_ What do you mean by "different" (i.e. how can it be observed)? Smth [like this](https://wandbox.org/permlink/DxwiPvBxlDKBi0Wv)? – Language Lawyer Feb 08 '21 at 19:48
  • @LanguageLawyer Explain what? What particularly you don't understand? – Daniel Langr Feb 09 '21 at 04:43
  • @LanguageLawyer As for your second question, basically yes. But I didn't write anything about observation. I care about the conversion of pointers to different types. (For example, can you always convert a poniter to `T1` to a pointer to `T2` and then back and be sure it is valid? It seems that this depends on the alignment requirements of `T1` and `T2`.) – Daniel Langr Feb 09 '21 at 04:55
  • _Explain what? What particularly you don't understand?_ How your representation forbids casts. – Language Lawyer Feb 11 '21 at 22:55
  • @LanguageLawyer I explained that with an example. What particularly you do not understand about it? Say there is an object of type `int` at address `0x04`. This will be represented as a pointer `0b0001`. But if you just `reinterpret_cast` `0b0001` to a pointer-to-`long`, it will refer to a different address: `0x08`. Or, vice versa. – Daniel Langr Feb 12 '21 at 04:12
  • I missed that you wrote that `long` and `int` have different alignment requirements and thought that only «object addresses shifted right by 2/3 bits» forbids casting. BTW _An object pointer can be explicitly converted to an object pointer of a different type_ speaks about `T` and `e`'s type in `reinterpret_cast(e)`, not about the resulting value and the value of `e`. – Language Lawyer Feb 12 '21 at 04:15
  • _... it will refer to a different address: 0x08. Or, vice versa._ And when this can become a problem? You can't use such casted pointer to access the object because it violates strict aliasing rule. And in pointer equality comparison, I think you can't rely on expression type to shift addresses back before comparison because the expression of type `T*` can have value «point an object of (unrelated) type `U`» so you may need to use `0b0001` as-is. But I think we went into too much of details. – Language Lawyer Feb 12 '21 at 04:20

2 Answers2

7

Yep

As a concrete example, there is a C++ implementation where pointers to single-byte elements are larger than pointers to multi-byte elements, because the hardware uses word (not byte) addressing. To emulate byte pointers, C++ uses a hardware pointer plus an extra byte offset.

void* stores that extra offset, but int* does not. Converting int* to char* works (as it must under the standard), but char* to int* loses that offset (which your note implicitly permits).

The Cray T90 supercomputer is an example of such hardware.

I will see if I can find the standards argument why this is valid thing for a compliant C++ compiler to do; I am only aware someone did it, not that it is legal to do it, but that note rather implies it is intended to be legal.

The rules are going to be in the to-from void pointer casting rules. The paragraph you quoted implicitly forwards the meaning of the conversion to there.

7.6.1.9 Static cast [expr.static.cast]

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

This demonstrates that converting to more-aligned types generates an unspecified pointer, but converting to equal-or-less aligned types that aren't actually there does not change the pointer value.

Which is permission to make a cast from a pointer to 4 byte aligned data converted to a pointer to 8 byte aligned data result in garbage.

Every object unrelated pointer cast needs to logically round-trip through a void* however.

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)).

(From the OP)

That covers void* to T*; I have yet to find the T* to void* conversion text to make this a complete level answer.

Yakk - Adam Nevraumont
  • 235,777
  • 25
  • 285
  • 465
  • So then how about _"An object pointer can be explicitly converted to an object pointer of a different type."_? I have the same opinion, just don't understand this statement. – Daniel Langr Feb 08 '21 at 13:49
  • 2
    [An example of such architectures](https://stackoverflow.com/a/6986260/995714) – phuclv Feb 08 '21 at 13:50
  • According to standard you CAN convert `int *` -> `char *` -> `int *` and get same address, but you CAN'T convert `char *` -> `int *` -> `char *` and get same address, because additional byte addressing will be lost. This is what this Note 7 is about. – sklott Feb 08 '21 at 14:23
  • @sklott: FYI, the note does not say you can’t do the latter conversion and get the same address; it implies (by omission) you can’t do the latter conversion and rely on getting the same address (based on the C++ standard alone). – Eric Postpischil Feb 08 '21 at 14:27
  • @daniel My non-backed opinion is that simply describes what `U* u=(U*)p_to_unrelated_other_type`; it is cast-to-void-pointer (see elsewhere in standard) then cast-void-to-target (see elsewhere in standard) – Yakk - Adam Nevraumont Feb 08 '21 at 14:39
  • Yep? Who are you? Gary Cooper? – Booboo Feb 08 '21 at 14:43
  • _I have yet to find the `T*` to `void*` conversion text_ It is in [\[conv.ptr\]/2](https://timsong-cpp.github.io/cppwp/n4659/conv.ptr#2) – Language Lawyer Feb 12 '21 at 04:21
4

The answer is yes. Simply because as the standard does not forbid it, an implementation could decide to have different representations for pointers to different types, or even different possible representations for a same pointer.

As most architecture now use flat addressing (meaning that the representation of the pointer is just the address), there are no good reason to do that. But I can still remember the old segment:offset address representation of the 8086 systems, that used to allow 16 bits systems to process 20 bits addresses (1024k). It used a 16 bit segment address (shifted by 4 bits to get a real address), and an offset of 16 bits for far pointers, or only 16 bits (relative to the current segment) for near addresses. In this mode, far pointers had a bunch of possible representations. BTW, far addressing was the default (so what was produced by normal source) in the large and compact mode (ref).

Serge Ballesta
  • 121,548
  • 10
  • 94
  • 199