1

I'm trying to craft some inline assembly to test performance of rotate on ARM. The code is part of a C++ code base, so the rotates are template specializations. The code is below, but its producing messages that don't make a lot of sense to me.

According to ARM Assembly Language, the instructions are roughly:

# rotate - rotate instruction
# dst - output operand
# lhs - value to be rotated
# rhs - rotate amount (immediate or register)
<rotate> <dst>, <lhs>, <rhs>

They don't make a lot of sense because (to me), for example, I use g to constrain the output register, and that's just a general purpose register per Simple Contraints. ARM is supposed to have a lot of them, and Machine Specific Constraints does not appear to change behavior of the constraint.

I'm not sure the best way to approach this, so I'm going to ask three questions:

  1. How do I encode the rotate when using a constant or immediate value?
  2. How do I encode the rotate when using a value passed through a register?
  3. How would thumb mode change the inline assembly

arm-linux-androideabi-g++ -DNDEBUG -g2 -Os -pipe -fPIC -mfloat-abi=softfp
-mfpu=vfpv3-d16 -mthumb --sysroot=/opt/android-ndk-r10e/platforms/android-21/arch-arm
-I/opt/android-ndk-r10e/sources/cxx-stl/stlport/stlport/ -c camellia.cpp
In file included from seckey.h:9:0,
             from camellia.h:9,
             from camellia.cpp:14:
misc.h: In function 'T CryptoPP::rotlFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1121:71: error: matching constraint not valid in output operand
  __asm__ ("rol %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1121:71: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotrFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1129:71: error: matching constraint not valid in output operand
  __asm__ ("ror %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1129:71: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotlVariable(T, unsigned int) [with T = unsigned int]':
misc.h:1137:72: error: matching constraint not valid in output operand
  __asm__ ("rol %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
                                                                        ^
misc.h:1137:72: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotrVariable(T, unsigned int) [with T = unsigned int]':
misc.h:1145:72: error: matching constraint not valid in output operand
  __asm__ ("ror %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
                                                                        ^
misc.h:1145:72: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotrFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1129:71: error: matching constraint not valid in output operand
  __asm__ ("ror %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1129:71: error: invalid lvalue in asm output 0
misc.h:1129:71: error: matching constraint references invalid operand number
misc.h: In function 'T CryptoPP::rotlFixed(T, unsigned int) [with T = unsigned int]':
misc.h:1121:71: error: matching constraint not valid in output operand
  __asm__ ("rol %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
                                                                       ^
misc.h:1121:71: error: invalid lvalue in asm output 0
misc.h:1121:71: error: matching constraint references invalid operand number

// ROL #n Rotate left immediate
template<> inline word32 rotlFixed<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("rol %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
    return static_cast<word32>(z);
}

// ROR #n Rotate right immediate
template<> inline word32 rotrFixed<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("ror %2, %0, %1" : "=g2" (z) : "g0" (x), "M1" ((int)(y%32)));
    return static_cast<word32>(z);
}

// ROR rn Rotate left by a register
template<> inline word32 rotlVariable<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("rol %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
    return static_cast<word32>(z);
}

// ROR rn Rotate right by a register
template<> inline word32 rotrVariable<word32>(word32 x, unsigned int y)
{
    int z;
    __asm__ ("ror %2, %0, %1"  : "=g2" (z) : "g0" (x), "g1" ((int)(y%32)));
    return static_cast<word32>(z);
}

template<> inline word32 rotlMod<word32>(word32 x, unsigned int y)
{
    return rotlVariable<word32>(x, y);
}

template<> inline word32 rotrMod<word32>(word32 x, unsigned int y)
{
    return rotrVariable<word32>(x, y);
}
jww
  • 83,594
  • 69
  • 338
  • 732
  • What did you want to achieve with `g2` and `M1`? The `2` and the `1` are the matching constraints that don't seem to make sense, and the compiler doesn't like them either. – Jester Jul 20 '15 at 10:31
  • @Jester - `2` is the output operand numer. It needs to be in a register, hence the `g2`. For `1`, that's the `rhs` or `shift amount`. For immediate, it needs to be constrained to immediate values, hence the `M1`. – jww Jul 20 '15 at 10:33
  • Note that GCC is clever enough to pick up the `x << y | x >> (32 - y)` idiom and emit a single `ror` instruction, provided the arguments are unsigned. – Notlikethat Jul 20 '15 at 10:42
  • Yes, but why did you add the `2` and the `1`? Those mean, put in the same place as the given other operand and you don't need that here. – Jester Jul 20 '15 at 10:50
  • @Notlikethat - *`x << y | x >> (32 - y)`* - that's undefined behavior when `y=0`. That code should *not* show up anywhere in production. And GCC does not offer a rotate intrinsic that would lay waste to these questions I have. If they provided it, then I would have been done a long time ago. Related: [Near constant time rotate that does not violate the standards](http://stackoverflow.com/q/31387778). – jww Jul 20 '15 at 10:56
  • @Jester - *"Those mean, put in the same place as the given other operand..."* - I'm not sure what you mean. Do you have a blog explaining the three operand rotate on ARM that I can read? Or can you provide an answer so I can see what its supposed to look like? – jww Jul 20 '15 at 10:59
  • It's not the instruction, it's the gcc inline assembly. Digits mean, put in the same place as the given other operand.`"=g2" (z) : "g0" (x), "M1(y)"` means, put `z` where `y` is but put `y` where `x` is but put `x` where `z` is and that's nonsense. Just drop the numbers to let the compiler pick any register since you don't care where each operand ends up. See [matching constraints](https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#index-digits-in-constraint-3490) in the manual. – Jester Jul 20 '15 at 11:13
  • @Jester - without the numbers, I'm back to *"asm operand 2 probably doesn't match constraints"* and *"error: impossible constraint in 'asm'"*. This is so god damn frustrating. I even bought a CRC Press book on the subject of [ARM Assembly Language](http://www.amazon.com/dp/1439806101) (that does not cover it). I could scream because GCC does not provide the intrinsic to solve the problem like nearly every other major compiler provider... – jww Jul 20 '15 at 11:22
  • @jww Good point, kernel hackers have a habit of overlooking "works as expected" UB ;) Still, `y = y % 32; z = y ? (x >> y) | (x << (32 - y)) : x;` isn't much worse for non-constant rotate (compiles to `ands; rorne`). – Notlikethat Jul 20 '15 at 11:26
  • @Notlikethat - yeah, I brought it up on the kernel mailing list a few years ago. They were pretty indignant about it. They told me the kernel does not attempt to be compliant with standards or observe the standards their tools use, and they did not care who they taught the wrong way. – jww Jul 20 '15 at 11:31
  • `(x << (n % 32)) | (x >> (-n % 32))` is the way to go. No UB, no branches. And it should compile to ROR instructions with gcc 8 for ARM and ARM64. – Trass3r Nov 11 '18 at 04:45

1 Answers1

2

First, ARM does not have rotate left (ROL), you need to emulate that through ROR.

Second, the M constraint for some reason accepts 0 to 32, but ROL only accepts 0 to 31 when dealing with immediates.

Third, the g constraint is too generic because it also allows memory operands that ROR does not accept. Better use r instead.

This is what I came up with:

// Rotate right
inline word32 rotr(word32 x, unsigned int y)
{
    int z;
    if (__builtin_constant_p(y))
    {
        y &= 31;
        if (y != 0) // this should be optimized away by the compiler
        {
            __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "M" (y));
        }
    } else {
        __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "r" (y));
    }
    return static_cast<word32>(z);
}

// Rotate left
inline word32 rotl(word32 x, unsigned int y)
{
    int z;
    if (__builtin_constant_p(y))
    {
        y &= 31;
        if (y != 0) // this should be optimized away by the compiler
        {
            __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "M" (32 - y));
        }
    } else {
        __asm__ ("ror %0, %1, %2" : "=r" (z) : "r" (x), "r" (32 - y));
    }
    return static_cast<word32>(z);
}
Jester
  • 52,795
  • 4
  • 67
  • 108
  • *"Second, the M constraint..."* - yeah, I really wanted the ***`I`*** simple constraint, but I could not get that to work either. – jww Jul 20 '15 at 12:52