I have a 256-bit data structure (a 4 int64 array) that I need to load and perform (integer) addition on in X64 assembler. Instead of doing 4 MOVs, I'm trying to load all 4 with a single instruction. Is this possible? I thought I could do it with MOVDQA, but that apparently will only load into the XMM registers, and from there my only option is to do a floating-point add, which is not what I need.
Edit: Current routine is:
mov rax, [rcx]
mov r8, 8[rcx]
mov r9, 16[rcx]
mov r10, 24[rcx]
add rax, [rdx]
adc r8, 8[rdx]
adc r9, 16[rdx]
adc r10, 24[rdx]
jc adjust_modular
mov [rcx], rax
mov 8[rcx], r8
mov 16[rcx], r9
mov 24[rcx], r10
adjust_modular: (....)
ret