4

I would like to perform ROR and ROL operations on variables in an Objective-C program. However, I can't manage it – I am not an assembly expert.

Here is what I have done so far:

uint8_t v1 = ....;
uint8_t v2 = ....; // v2 is either 1, 2, 3, 4 or 5

asm("ROR v1, v2"); 

the error I get is:

Unknown use of instruction mnemonic with unknown size suffix

How can I fix this?

Edit: The code does not need to use inline assembly. However, I haven't found a way to do this using Objective-C / C++ / C instructions.

  • 2
    Are you sure you need assembly? That should only be needed if this is a performance bottleneck. For normal use, using `var = (var << shift) | (var >> (sizeof(var)*8-shift))` would be fine. – Dave May 05 '13 at 18:40
  • 2
    For more details, http://en.wikipedia.org/wiki/Circular_shift#Implementing_circular_shifts – Dave May 05 '13 at 18:41
  • nice. Thank you for your reply. Since this answers the question, if you make it an answer, I will accept it. –  May 05 '13 at 18:43

2 Answers2

2

To do this in standard C, you can do:

var = (var << shift) | (var >> (sizeof(var)*CHAR_BIT-shift))

Most compilers will recognise that pattern and optimise it to a single instruction (if the target supports it) anyway.

You can read more here: http://en.wikipedia.org/wiki/Circular_shift#Implementing_circular_shifts

Dave
  • 36,791
  • 8
  • 53
  • 96
  • 1
    `++posting++`; especially notice the _references_ on the wikipedia article which quote the optimization applied by `gcc` / `clang` with respect to converting this type of code into hardware-rotate where available. Since Objective C/C++ always means `gcc` / `clang`, it's safe to assume the optimization is applied. In addition, for ARM, _not_ coding the rotate as _separate instruction_ is better, because the barrel shifter can "integrate" rotation into any arithmetic; `var = rotate(var, xxx) + 1` is a _single instruction_ in ARM, but only detectable if _not_ split as function call / inline asm. – FrankH. May 07 '13 at 11:47
  • Thank you for your reply. It is most helpful. Do you happen to know how to do it using inline assembly? I ask because I am curious to see what the code looks like and how to reference local variables in that assembly code. –  May 08 '13 at 10:42
  • How to code it in assembly depends on your target architecture. I can't write assembly myself (meaning to learn), but by the looks of the error message I'd guess you need to tell it if you want a 32-bit or 64-bit operation, or something like that. Maybe this page will help: http://sourceware.org/binutils/docs/as/i386_002dMnemonics.html – Dave May 08 '13 at 11:11
1
var = (var << shift) | (var >> (sizeof(var)*CHAR_BIT-shift))

Don't use this code. It has undefined behavior when shift is 0. Intel's ICC removes the statements with the undefined behavior. I know that first hand.

Plus, the code won't pass Clang's or GCC's Undefined Behavior sanitizer. For reading, see Clang's Controlling Code Generation or GCC's Undefined Behavior Sanitizer – ubsan.


the error I get is:
Unknown use of instruction mnemonic with unknown size suffix

You are using one of two tools - either GCC or Clang. I think Apple cut-over to Clang by default around Xcode 4, so you are probably using Clang.

GCC will delegate to GNU AS (GAS), while Clang will use its Integrated Assembler. In both cases, you should use AT&T inline assembly because Clang support for Intel assembly is spotty. For example, Clang can't generate a negate instructions (a.k.a. LLVM Bug 24232) at the moment.

When using Clang, you need to specify the operand size. So you will use rolb, rolw, roll, and rolq and friends. This is documented at Clang's Language Compatibility | Inline Assembly page.

Here's what the 8-bit rotate looks like:

// Immediate
inline word8 rotlImmediate8 (word8 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rolb %1, %0" : "+mq" (x) : "I" ((unsigned char)y));
    return x;
}

// Immediate or register
inline word8 rotl8 (word8 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rolb %1, %0" : "+mq" (x) : "cI" ((unsigned char)y));
    return x;
}

// Immediate
inline word8 rotrImmediate8 (word8 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rorb %1, %0" : "+mq" (x) : "I" ((unsigned char)y));
    return x;
}

// Immediate or register
inline word8 rotr8 (word8 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rorb %1, %0" : "+mq" (x) : "cI" ((unsigned char)y));
    return x;
}

The 8-bit word needs special handling on constraints. You can't use +g; rather you need +mq.

Here's the 16-bit word version:

// Immediate
inline word16 rotlImmediate16 (word16 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rolw %1, %0" : "+g" (x) : "I" ((unsigned char)y));
    return x;
}

// Immediate or register
inline word16 rotl16 (word16 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rolw %1, %0" : "+g" (x) : "cI" ((unsigned char)y));
    return x;
}

// Immediate
inline word16 rotrImmediate16 (word16 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rorw %1, %0" : "+g" (x) : "I" ((unsigned char)y));
    return x;
}

// Immediate or register
inline word16 rotr16 (word16 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rorw %1, %0" : "+g" (x) : "cI" ((unsigned char)y));
    return x;
}

And here's the 32-bit version:

// Immediate
inline word32 rotlImmediate32 (word32 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("roll %1, %0" : "+g" (x) : "I" ((unsigned char)y));
    return x;
}

// Immediate or register
inline word32 rotl32 (word32 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("roll %1, %0" : "+g" (x) : "cI" ((unsigned char)y));
    return x;
}

// Immediate
inline word32 rotrImmediate32 (word32 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rorl %1, %0" : "+g" (x) : "I" ((unsigned char)y));
    return x;
}

// Immediate or register
inline word32 rotr32 (word32 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rorl %1, %0" : "+g" (x) : "cI" ((unsigned char)y));
    return x;
}

Finally, here's the 64-bit version. You should guard it with something like __amd64 or __x86_64__. Because the rotate amount can be [0,63], you use the J constraint.

// Immediate
inline word64 rotlImmediate64 (word64 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rolq %1, %0" : "+g" (x) : "J" ((unsigned char)y));
    return x;
}

// Immediate or register
inline word64 rotl64 (word64 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rolq %1, %0" : "+g" (x) : "cJ" ((unsigned char)y));
    return x;
}

// Immediate
inline word64 rotrImmediate64 (word64 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rorq %1, %0" : "+g" (x) : "J" ((unsigned char)y));
    return x;
}

// Immediate or register
inline word64 rotr64 (word64 x /*value*/, unsigned int y /*rotate*/)
{
    __asm__ ("rorq %1, %0" : "+g" (x) : "cJ" ((unsigned char)y));
    return x;
}

Clang does not propagate constants like GCC, so you might have trouble with the Immediate-8 version of the rotates. Also see Force Clang to “perform math early” on constant values on Stack Overflow and LLVM Bug 24226.


You should take the time and visit John Regehr's Safe, Efficient, and Portable Rotate in C/C++. Its kind of anti-climactic. It says once you write the rotate properly in C/C++ (i.e., no undefined behavior), it will no longer be recognized as a rotate, and the rotate instruction won't be generated.

Finally, also see Near constant time rotate that does not violate the standards on Stack Overflow.

Community
  • 1
  • 1
jww
  • 83,594
  • 69
  • 338
  • 732