I have seen a few very informative posts here and here discussing the merits of difference approach for copying / setting memory.
All of the posts go into detail about the pros and cons of REP MOVSD in a variety of settings, though I found where discussing the topic in too generic a setting to actual come to any definitive answers.
So for x86_64 skylake I am curious about the following scenarios:
Assume for all scenarios both src and dst are page aligned
Copying 16384 bytes of data
- Comparing
REP MOVSD
,ymm blocks
,zmm blocks
- Comparing
Copying 2 Gibabytes of data
- Comparing
REP MOVSD
,ymm blocks w/ VMOVNTDQ
,zmm blocks w/ VMOVNTDQ
- Comparing
Setting 16384 bytes of data
- Comparing
REP STOSD
,ymm blocks
,zmm blocks
- Comparing
Setting 2 Gibabytes of data
- Comparing
REP STOSD
,ymm blocks w/ VMOVNTDQ
,zmm blocks w/ VMOVNTDQ
- Comparing
Based the glibc imlementation of memcpy and memset
It seems SIMD register for memcpy
and REP STOSD
for both of the memset cases.
I am curious if there are any hard recommendations for these scenarios (and to understand better why they are recommendations).
Thank you.
Edit: One of the reasons I am making this post is that the other posts didn't seem to discuss AVX512
.