I'm having trouble finding the NEON instrinsic I need. I have a 128-bit value as a int64x2_t
, and I need to copy the low 64-bits to the high 64-bits. I also need to copy the high 64-bits to the low 64-bits on occasion.
NEON has a lane dup, but it takes int64x1_t
and returns a int64x1_t
.
int64x1_t vdup_lane_s64(int64x1_t vec, __constrange(0,0) int lane);
The range also seems off since it seems like I should be able to select 1 or 2. (Maybe this is a misunderstanding on my part).
How do I copy the low 64-bits to the high 64-bits in a int64x2_t
?
I'm not using the (high >> x) | (low << x)
pattern as suggested below. First, its undefined behavior in C/C++ when x
is 0. Second, the value should be in a NEON SIMD register, so I don't want to accidentally round trip it. Third, GCC is not generating the code I hoped for, so I don't want to give GCC the opportunity to get slower.