2

Why is the definition of FXMVECTOR differ on 32-bit and 64-bit? Why shouldn't pass XMVECTOR by reference on 32-bit to use the glory of SIMD?

Thank you in advance!

roalz
  • 2,581
  • 3
  • 24
  • 41
NMD
  • 93
  • 9

1 Answers1

3

The details of the various calling convention macros and types are covered in detail on Microsoft Docs.

In short, the library is attempting to support reasonably 'optimal' calling conventions across a number of platforms:

  • 32-bit __fastcall where the first three SIMD values are passed in register. The rest must be passed by reference because the stack only guarantees 4-byte alignment.

  • 32-bit __vectorcall (requires VS 2013 or later) where up to the first six SIMD values are passed in register, as well as HVAs (i.e. matrices of SIMD values)

  • 64-bit __fastcall which will never pass any SIMD values in register, but the stack is 16-byte aligned.

  • 64-bit __vectorcall (requires VS 2013 or later) where up to the first six SIMD values are passed in register, as well as HVAs (i.e. matrices of SIMD values)

  • ARM/ARM64 which will pass up to the first four SIMD values in register and supports HVAs.

So to ensure things can be passed in register, they are passed 'by value'. To minimize copies, things that aren't likely to end up in a register should be passed 'by reference'.

Of course, the real hope is that inlining removes the calling-convention usage in final optimized code, but you can't guarantee that.

Chuck Walbourn
  • 28,931
  • 1
  • 45
  • 72