ROL / ROR on variable using inline assembly only in Objective-C

Question

A few days ago, I asked the question below. Because I was in need of a quick answer, I added:

The code does not need to use inline assembly. However, I haven't found a way to do this using Objective-C / C++ / C instructions.

Today, I would like to learn something. So I ask the question again, looking for an answer using inline assembly.

I would like to perform ROR and ROL operations on variables in an Objective-C program. However, I can't manage it – I am not an assembly expert.

Here is what I have done so far:

uint8_t v1 = ....;
uint8_t v2 = ....; // v2 is either 1, 2, 3, 4 or 5

asm("ROR v1, v2");

the error I get is:

Unknown use of instruction mnemonic with unknown size suffix

How can I fix this?

This is a really appropriate comment. Thank you for writing it. The code is x86_64. — , May 10 '13 at 12:55
This question is not a duplicate and I I believe don't need to edit the question to explain how it is different **since I already did that** when writing the question. **FrankH.**, **Monolo**, **NT3RP**, **ldav1s**, and **Thor** **just didn't bother to read the question**. The explanation on why it is not a duplicate is kind of hard to miss, since it is the first topic in the question. — , May 10 '13 at 13:30

CRD · Accepted Answer · 2013-05-10T19:28:41.993

A rotate is just two shifts - some bits go left, the others right - once you see this rotating is easy without assembly. The pattern is recognised by some compilers and compiled using the rotate instructions. See wikipedia for the code.

Update: Xcode 4.6.2 (others not tested) on x86-64 compiles the double shift + or to a rotate for 32 & 64 bit operands, for 8 & 16 bit operands the double shift + or is kept. Why? Maybe the compiler understands something about the performance of these instructions, maybe the just didn't optimise - but in general if you can avoid assembler do so, the compiler invariably knows best! Also using static inline on the functions, or using macros defined in the same way as the standard macro MAX (a macro has the advantage of adapting to the type of its operands), can be used to inline the operations.

Addendum after OP comment

Here is the i86_64 assembler as an example, for full details of how to use the asm construct start here.

First the non-assembler version:

static inline uint32 rotl32_i64(uint32 value, unsigned shift)
{
   // assume shift is in range 0..31 or subtraction would be wrong
   // however we know the compiler will spot the pattern and replace
   // the expression with a single roll and there will be no subtraction
   // so if the compiler changes this may break without:
   //    shift &= 0x1f;
   return (value << shift) | (value >> (32 - shift));
}

void test_rotl32(uint32 value, unsigned shift)
{
   uint32 shifted = rotl32_i64(value, shift);

   NSLog(@"%8x <<< %u -> %8x", value & 0xFFFFFFFF, shift, shifted & 0xFFFFFFFF);
}

If you look at the assembler output for profiling (so the optimiser kicks in) in Xcode (Product > Generate Output > Assembly File, then select Profiling in the pop-up menu as the bottom of the window) you will see that rotl32_i64 is inlined into test_rotl32 and compiles down to a rotate (roll) instruction.

Now producing the assembler directly yourself is a bit more involved than for the ARM code FrankH showed. This is because to take a variable shift value a specific register, cl, must be used, so we need to give the compiler enough information to do that. Here goes:

static inline uint32 rotl32_i64_asm(uint32 value, unsigned shift)
{
   // i64 - shift must be in register cl so create a register local assigned to cl
   // no need to mask as i64 will do that
   register uint8 cl asm ( "cl" ) = shift;
   uint32 shifted;
   // emit the rotate left long
   // %n values are replaced by args:
   //    0: "=r" (shifted) - any register (r), result(=), store in var (shifted)
   //    1: "0" (value) - *same* register as %0 (0), load from var (value)
   //    2: "r" (cl) - any register (r), load from var (cl - which is the cl register so this one is used)
   __asm__ ("roll %2,%0" : "=r" (shifted) : "0" (value), "r" (cl));
   return shifted;
}

Change test_rotl32 to call rotl32_i64_asm and check the assembly output again - it should be the same, i.e. the compiler did as well as we did.

Further note that if the commented out masking line in rotl32_i64 is included it essentially becomes rotl32 - the compiler will do the right thing for any architecture all for the cost of a single and instruction in the i64 version.

So asm is there is you need it, using it can be somewhat involved, and the compiler will invariably do as well or better by itself...

HTH

The ARM barrel shifter only operates on registers full-width, so you can tell it to rotate a 32bit quantity but not an 8/16bit one. That's why you get the double-shift/or for non-32bit. You can shift 8bit immediates to any position within a 32bit word, but you cannot rotate an arbitrary byte within a 32bit word. — FrankH., May 09 '13 at 13:24
@FrankH. - Oops I missed specifying x86-64 in my answer - which does have 8 & 16 bit rotates - edited, thanks. Combining your ARM and the above x86-64 I think emphasises the point we both made: *use the standard C code and let the compiler figure it out*. — CRD, May 09 '13 at 18:49
+1. CRD, thank you for your reply. ROR and ROL were more examples. I am actually more interested in how to use local variables in inline assembly code. Many time I have seen code like `: "=r"(out) : "r"(in), "M"(N);` without understanding it. And clang's page on inline assembly is more like a few sentences... — , May 10 '13 at 13:24

score 0 · Answer 2 · answered May 09 '13 at 13:34

0

The 32bit rotate in ARM would be:

__asm__("MOV %0, %1, ROR %2\n" : "=r"(out) : "r"(in), "M"(N));

where N is required to be a compile-time constant.

But the output of the barrel shifter, whether used on a register or an immediate operand, is always a full-register-width; you can shift a constant 8-bit quantity to any position within a 32bit word, or - as here - shift/rotate the value in a 32bit register any which way.
But you cannot rotate 16bit or 8bit values within a register using a single ARM instruction. None such exists.

That's why the compiler, on ARM targets, when you use the "normal" (portable [Objective-]C/C++) code (in << xx) | (in >> (w - xx)) will create you one assembler instruction for a 32bit rotate, but at least two (a normal shift followed by a shifted or) for 8/16bit ones.

answered May 09 '13 at 13:34

FrankH.

16,133
2
36
54

On one hand, you took the time to answer me and I appreciate it. On the other hand, you voted to close the question ensuring that no one would add any answer, and I do not appreciate it. Not only it is not fair play but also totally inappropriate given the fact that it is not a duplicate of the said question. You should take the time to read both question. The first one allowed non ASM code as an answer, this option was faithfully chosen by the author of the accepted answer. Now, I am interested in ans ASM answer. No question can have two accepted answers. – May 10 '13 at 13:17
I still assert it is a duplicate; you _can change / clarify_ your original question, you know ? Posting two as-good-as-identical questions is only ... fishing for reputation. – FrankH. May 12 '13 at 09:37
Have you considered that both questions are mutually exclusive ? One is asking for an assembly answer, the other not. Plus, my knowledge is all things related to programming / developing applications is so small that I really can't even dream of getting any reputation. I regularly look at the questions. The 1% easy questions are answered immediately, the other 99% I can't help. – May 13 '13 at 13:35

ROL / ROR on variable using inline assembly only in Objective-C

2 Answers2