Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?

Question

Is it legal to reinterpret_cast a float* to a __m256* and access float objects through a different pointer type?

constexpr size_t _m256_float_step_sz = sizeof(__m256) / sizeof(float);
alignas(__m256) float stack_store[100 * _m256_float_step_sz ]{};
__m256& hwvec1 = *reinterpret_cast<__m256*>(&stack_store[0 * _m256_float_step_sz]);

using arr_t = float[_m256_float_step_sz];
arr_t& arr1 = *reinterpret_cast<float(*)[_m256_float_step_sz]>(&hwvec1);

Do hwvec1 and arr1 depend on undefined behaviors?

Do they violate strict aliasing rules? [basic.lval]/11

Or there is only one defined way of intrinsic:

__m256 hwvec2 = _mm256_load_ps(&stack_store[0 * _m256_float_step_sz]);
_mm256_store_ps(&stack_store[1 * _m256_float_step_sz], hwvec2);

godbolt

Why do you think it doesn't violate strict aliasing rule? In my opinion, your first code violates it. I'd use intrinsics for this, just as you suggest. — geza, Aug 31 '18 at 10:14
@geza Thank you. I'm just uncertained because the representation underlined is never be accessed as any other type than `float` — sandthorn, Aug 31 '18 at 10:21
Won't you use it as `__m256` as well? If not, then what's the point? :) — geza, Aug 31 '18 at 10:22
@geza So in your opinion, does accessing floats that reside inside `__m256` object and within `__m256` lifetime violate strict aliasing rules? — sandthorn, Aug 31 '18 at 13:12
Yes, I wouldn't do it. There is a surely non-violating solution, I'd use load/store intrinsics instead. The only reason to choose reinterpret_cast if for some reason, it is faster. But current compilers are pretty good optimizing these kind of stuff. — geza, Aug 31 '18 at 13:41

Peter Cordes · Accepted Answer · 2018-08-31T21:44:06.157

ISO C++ doesn't define __m256, so we need to look at what does define their behaviour on the implementations that support them.

Intel's intrinsics define vector-pointers like __m256* as being allowed to alias anything else, the same way ISO C++ defines char* as being allowed to alias.

So yes, it's safe to dereference a __m256* instead of using a _mm256_load_ps() aligned-load intrinsic.

But especially for float/double, it's often easier to use the intrinsics because they take care of casting from float*, too. For integers, the AVX512 load/store intrinsics are defined as taking void*, but before that you need an extra (__m256i*) which is just a lot of clutter.

In gcc, this is implemented by defining __m256 with a may_alias attribute: from gcc7.3's avxintrin.h (one of the headers that <immintrin.h> includes):

/* The Intel API is flexible enough that we must allow aliasing with other
   vector types, and their scalar components.  */
typedef float __m256 __attribute__ ((__vector_size__ (32),
                                     __may_alias__));
typedef long long __m256i __attribute__ ((__vector_size__ (32),
                                          __may_alias__));
typedef double __m256d __attribute__ ((__vector_size__ (32),
                                       __may_alias__));

/* Unaligned version of the same types.  */
typedef float __m256_u __attribute__ ((__vector_size__ (32),
                                       __may_alias__,
                                       __aligned__ (1)));
typedef long long __m256i_u __attribute__ ((__vector_size__ (32),
                                            __may_alias__,
                                            __aligned__ (1)));
typedef double __m256d_u __attribute__ ((__vector_size__ (32),
                                         __may_alias__,
                                         __aligned__ (1)));

(In case you were wondering, this is why dereferencing a __m256* is like _mm256_store_ps, not storeu.)

GNU C native vectors without may_alias are allowed to alias their scalar type, e.g. even without the may_alias, you could safely cast between float* and a hypothetical v8sf type. But may_alias makes it safe to load from an array of int[], char[], or whatever.

I'm talking about how GCC implements Intel's intrinsics only because that's what I'm familiar with. I've heard from gcc developers that they chose that implementation because it was required for compatibility with Intel.

Other behaviour Intel's intrinsics require to be defined

Using Intel's API for _mm_storeu_si128( (__m128i*)&arr[i], vec); requires you to create potentially-unaligned pointers which would fault if you deferenced them. And _mm_storeu_ps to a location that isn't 4-byte aligned requires creating an under-aligned float*.

Just creating unaligned pointers, or pointers outside an object, is UB in ISO C++, even if you don't dereference them. I guess this allows implementations on exotic hardware which do some kinds of checks on pointers when creating them (possibly instead of when dereferencing), or maybe which can't store the low bits of pointers. (I have no idea if any specific hardware exists where more efficient code is possible because of this UB.)

But implementations which support Intel's intrinsics must define the behaviour, at least for the __m* types and float*/double*. This is trivial for compilers targeting any normal modern CPU, including x86 with a flat memory model (no segmentation); pointers in asm are just integers kept in the same registers as data. (m68k has address vs. data registers, but it never faults from keeping bit-patterns that aren't valid addresses in A registers, as long as you don't deref them.)

Going the other way: element access of a vector.

Note that may_alias, like the char* aliasing rule, only goes one way: it is not guaranteed to be safe to use int32_t* to read a __m256. It might not even be safe to use float* to read a __m256. Just like it's not safe to do char buf[1024]; int *p = (int*)buf;.

Reading/writing through a char* can alias anything, but when you have a char object, strict-aliasing does make it UB to read it through other types. (I'm not sure if the major implementations on x86 do define that behaviour, but you don't need to rely on it because they optimize away memcpy of 4 bytes into an int32_t. You can and should use memcpy to express an unaligned load from a char[] buffer, because auto-vectorization with a wider type is allowed to assume 2-byte alignment for int16_t*, and make code that fails if it's not: Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?)

To insert/extract vector elements, use shuffle intrinsics, SSE2 _mm_insert_epi16 / _mm_extract_epi16 or SSE4.1 insert / _mm_extract_epi8/32/64. For float, there are no insert/extract intrinsics that you should use with scalar float.

Or store to an array and read the array. (print a __m128i variable). This does actually optimize away to vector extract instructions.

GNU C vector syntax provides the [] operator for vectors, like __m256 v = ...; v[3] = 1.25;. MSVC defines vector types as a union with a .m128_f32[] member for per-element access.

There are wrapper libraries like Agner Fog's (GPL licensed) Vector Class Library which provide portable operator[] overloads for their vector types, and operator + / - / * / << and so on. It's quite nice, especially for integer types where having different types for different element widths make v1 + v2 work with the right size. (GNU C native vector syntax does that for float/double vectors, and defines __m128i as a vector of signed int64_t, but MSVC doesn't provide operators on the base __m128 types.)

You can also use union type-punning between a vector and an array of some type, which is safe in ISO C99, and in GNU C++, but not in ISO C++. I think it's officially safe in MSVC, too, because I think the way they define __m128 as a normal union.

There's no guarantee you'll get efficient code from any of these element-access methods, though. Do not use inside inner loops, and have a look at the resulting asm if performance matters.

Curiously, while icc (unlike gcc and clang) is generally sophisticated enough to recognize that when a pointer which is converted from `T*` to `U*`, and used to access the storage before the next time it is accessed via other means, such an operation might actually affect the value of the `T` in question (i.e. it can handle type punning in cases *that don't actually involve aliasing*) my testing suggests that it does not handle such cases when they involve types `__m256*` and `uint32_t*`, even when the `uint32_t*` is derived from the same pointer object used to access the `__m256`. — supercat, Aug 31 '18 at 21:17
You think this is a close enough of a dupe to: https://stackoverflow.com/questions/24787268/how-to-implement-mm-storeu-epi64-without-aliasing-problems ? My vote is binding, so I'm hesitant to pull the trigger. — Mysticial, Aug 31 '18 at 21:45
@Mysticial: hrm, yeah our answers could each almost answer both questions, even though the questions are slightly different (the other one seems to assume that `_mm_storeu_pd` will have the same aliasing semantics as a dereference, but it's an intrinsic so it could do anything.) I like my answer better, because instead of saying there is (apparent) UB but it happens to work, I'm saying that compilers which support intrinsics *do* define the behaviour in this case. That's my only hesitation in duphammering. Maybe I should repost mine there? — Peter Cordes, Aug 31 '18 at 22:03
Or close that as a dup of this? But your answer is good, too. — Peter Cordes, Aug 31 '18 at 22:03
@Mysticial I like your answer too, especially the trailing general guidelines. — sandthorn, Sep 01 '18 at 05:03
I wonder whether declaring `implementation defined behavior` on the subject that is `undefined behavior` by the standard (e.g. __m256 allows aliasing) is always okey? For me even AVX512's `void*`, it looks somewhat "aliasing loophole" like the privileged one `memcpy` anyway. — sandthorn, Sep 01 '18 at 05:19

MSalters · Answer 2 · 2018-08-31T10:05:45.620

-2

[edit: for the downvoter, see https://stackoverflow.com/questions/tagged/language-lawyer. This answer is valid for any ISO C++ standard from C++98 to the current draft. It's generally assumed that basic concepts such as Undefined Behavior do not need detailed explanation, but see http://eel.is/c++draft/defns.undefined and various question on SO]

It already starts out being Undefined Behavior on account of __m256 not being a Standard type, nor a valid name for user-defined types.

Implementations can of course add specific additional guarantees, but Undefined Behavior means in relation to ISO C++.

edited Aug 31 '18 at 10:05

answered Aug 31 '18 at 09:43

MSalters

159,923
8
140
320

Will you say `hwvec1` and `arr1` are implementation defined? – sandthorn Aug 31 '18 at 09:46
@sandthorn: That's another ISO C++ term, and no, ISO C++ says it's Undefined Behavior, not Implementation-Defined Behavior. – MSalters Aug 31 '18 at 09:48
2

__m256 is provided by the implementation. It's an extension. – n. 'pronouns' m. Aug 31 '18 at 09:51
@MSalters Do `hwvec1` and `arr1` violate current strict aliasing rules? – sandthorn Aug 31 '18 at 09:52
@sandthorn: In an ISO C++ sense, the answer would be "Undefined Behavior already happened before (due to `__m256`), so further rules are meaningless. (You tagged it language-lawyer, so that's why I am lawyering. n.m. has a point that it's an extension, but one of the key reasons why Undefined Behavior exists in the standard is so that extensions are completely unrestricted in what they can achieve. ) – MSalters Aug 31 '18 at 09:58
Do you also imply that extensions are unrestricted even on strict aliasing rules? – sandthorn Aug 31 '18 at 10:04
@sandthorn: The problem from a Standard perspective is that it wants to allow extensions such as `extern "FORTRAN"` or `extern "JAVA"` or `__strict_aliasing(off)` or a million other things somebody may find useful. For those extensions to be useful, you expect a subset of C++ rules to hold, but the Standard simply cannot say in advance which rules still apply to which extension. That's why the language-lawyer answer is "`Undefined Behavior`, stop applying further rules" – MSalters Aug 31 '18 at 10:10
@MSalters am downvoter. If you can show me the rule that says that using `__m256` type is *undefined* behaviour rather than *unspecified*, I'll remove the vote. – eerorika Aug 31 '18 at 10:11
@user2079303 As you are here, Will you give some opinions about my question as another answer? – sandthorn Aug 31 '18 at 10:14
@user2079303: can you show the rule which says that it is *unspecified*? – geza Aug 31 '18 at 10:21
@user2079303: See `[lex.name]` and also (by reference) C99 7.1.3 Reserved identifiers, which explicitly calls it Undefined Behavior. – MSalters Aug 31 '18 at 10:21
@geza C++ standard doesn't specify `__m256`, thus there is no rule. – eerorika Aug 31 '18 at 10:22
1

@MSalters being defined by the implementation, `__m256` is not only allowed, but in fact required to use a reserved name. – eerorika Aug 31 '18 at 10:26
@user2079303: then why would it be unspecified behavior? Usually, the standard calls something else as "unspecified behavior". Like evaluation order for function call parameters, etc. Unspecified behavior is something, for which an implementation is not required to document the behavior. for `__m256`, an implementation surely can document the expected behavior. – geza Aug 31 '18 at 10:32
@geza an implementation *may* document any unspecified behaviour. Implementation is *required* to document all implementation defined behaviour. Standard doesn't require implementations to specify the behaviour of `__m256`. – eerorika Aug 31 '18 at 10:33
@user2079303: If an implementation gives a non-standard type, it definitely will document it. Otherwise it would be useless. For unspecified behaviors, compilers usually won't specify behavior, because it will depend on a lot of factors (optimization, etc.). So I don't think that `__m256` is unspecified behavior. Things that the standard doesn't specify is Undefined behavior: "behavior for which this document imposes no requirements". Just like `__m256`. The standard doesn't impose any requirement for this type. – geza Aug 31 '18 at 10:43
@geza @user2079303 I never enrolled any law courses please correct me if I misunderstand anything. "`Unspecified behavior`" == "behavior not required by implementation document and standard document", "`Undefined behaviior`" == "behavior not required by the standard document". If those two statements earlier are true, so `undefined behaviors` is ***superset*** of `unspecified behaviors`. – sandthorn Aug 31 '18 at 13:28
@sandthorn: not exactly. As I understand, unspecified means that the compiler must choose from several possible behaviors, and it is unspecified, which is chosen. Undefined means **anything** can happen, the standard doesn't say what could happen. Unspecified and undefined behaviors are mutually exclusive (But maybe I'm wrong, I'm not a language lawyer either). – geza Aug 31 '18 at 13:44
1

@geza: The Standard does not require that implementations be useful for any particular purpose, or any purpose whatsoever. The question of what a quality implementation must do to be suitable for any purpose is largely orthogonal to the question of what an implementation must do to conform to the C Standard. – supercat Aug 31 '18 at 17:02
1

@geza: If an action invokes Undefined Behavior, that means that a compiler may behave in a fashion that would make it unsuitable for some purposes and yet still be conforming. Some compiler writers seem to think that programmers have no right to expect anything of a compiler beyond the fact that it be "conforming" (e.g. expect that it be suitable for the purposes their programs are supposed to serve), and that any code which relies upon things beyond that is "broken". Such a view is IMHO absurd, but seems to be steering current compiler philosophy. – supercat Aug 31 '18 at 17:09
@MSalters: I think the answer you posted is correct but not useful. We want to know the semantics of `__m256*` *on implementations that do define it in the first place*, and which aim for compat with Intel's implementation / documentation. Of course the ISO C++ standard has nothing to say about it. I posted an answer that addresses it from that angle. – Peter Cordes Aug 31 '18 at 21:28
@supercat: I mostly agree with you. But what is the conclusion? :) I mean, is there something in my previous comments that you disagree with? – geza Aug 31 '18 at 22:07
1

@geza: You seem to share the common presumption that compiler writers will try to make their compilers maximally suitable for the purposes to which their users would be likely to put them. While I would say that would true of competent people seeking to write *quality* compilers, that is not true of all compiler writers/maintenance teams. – supercat Aug 31 '18 at 22:12

Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?

2 Answers2

Other behaviour Intel's intrinsics require to be defined

Going the other way: element access of a vector.

Linked

Related