var = (var << shift) | (var >> (sizeof(var)*CHAR_BIT-shift))
Don't use this code. It has undefined behavior when shift
is 0. Intel's ICC removes the statements with the undefined behavior. I know that first hand.
Plus, the code won't pass Clang's or GCC's Undefined Behavior sanitizer. For reading, see Clang's Controlling Code Generation or GCC's Undefined Behavior Sanitizer – ubsan.
the error I get is:
Unknown use of instruction mnemonic with unknown size suffix
You are using one of two tools - either GCC or Clang. I think Apple cut-over to Clang by default around Xcode 4, so you are probably using Clang.
GCC will delegate to GNU AS (GAS), while Clang will use its Integrated Assembler. In both cases, you should use AT&T inline assembly because Clang support for Intel assembly is spotty. For example, Clang can't generate a negate instructions (a.k.a. LLVM Bug 24232) at the moment.
When using Clang, you need to specify the operand size. So you will use rolb
, rolw
, roll
, and rolq
and friends. This is documented at Clang's Language Compatibility | Inline Assembly page.
Here's what the 8-bit rotate looks like:
// Immediate
inline word8 rotlImmediate8 (word8 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rolb %1, %0" : "+mq" (x) : "I" ((unsigned char)y));
return x;
}
// Immediate or register
inline word8 rotl8 (word8 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rolb %1, %0" : "+mq" (x) : "cI" ((unsigned char)y));
return x;
}
// Immediate
inline word8 rotrImmediate8 (word8 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rorb %1, %0" : "+mq" (x) : "I" ((unsigned char)y));
return x;
}
// Immediate or register
inline word8 rotr8 (word8 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rorb %1, %0" : "+mq" (x) : "cI" ((unsigned char)y));
return x;
}
The 8-bit word needs special handling on constraints. You can't use +g
; rather you need +mq
.
Here's the 16-bit word version:
// Immediate
inline word16 rotlImmediate16 (word16 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rolw %1, %0" : "+g" (x) : "I" ((unsigned char)y));
return x;
}
// Immediate or register
inline word16 rotl16 (word16 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rolw %1, %0" : "+g" (x) : "cI" ((unsigned char)y));
return x;
}
// Immediate
inline word16 rotrImmediate16 (word16 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rorw %1, %0" : "+g" (x) : "I" ((unsigned char)y));
return x;
}
// Immediate or register
inline word16 rotr16 (word16 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rorw %1, %0" : "+g" (x) : "cI" ((unsigned char)y));
return x;
}
And here's the 32-bit version:
// Immediate
inline word32 rotlImmediate32 (word32 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("roll %1, %0" : "+g" (x) : "I" ((unsigned char)y));
return x;
}
// Immediate or register
inline word32 rotl32 (word32 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("roll %1, %0" : "+g" (x) : "cI" ((unsigned char)y));
return x;
}
// Immediate
inline word32 rotrImmediate32 (word32 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rorl %1, %0" : "+g" (x) : "I" ((unsigned char)y));
return x;
}
// Immediate or register
inline word32 rotr32 (word32 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rorl %1, %0" : "+g" (x) : "cI" ((unsigned char)y));
return x;
}
Finally, here's the 64-bit version. You should guard it with something like __amd64
or __x86_64__
. Because the rotate amount can be [0,63]
, you use the J
constraint.
// Immediate
inline word64 rotlImmediate64 (word64 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rolq %1, %0" : "+g" (x) : "J" ((unsigned char)y));
return x;
}
// Immediate or register
inline word64 rotl64 (word64 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rolq %1, %0" : "+g" (x) : "cJ" ((unsigned char)y));
return x;
}
// Immediate
inline word64 rotrImmediate64 (word64 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rorq %1, %0" : "+g" (x) : "J" ((unsigned char)y));
return x;
}
// Immediate or register
inline word64 rotr64 (word64 x /*value*/, unsigned int y /*rotate*/)
{
__asm__ ("rorq %1, %0" : "+g" (x) : "cJ" ((unsigned char)y));
return x;
}
Clang does not propagate constants like GCC, so you might have trouble with the Immediate-8 version of the rotates. Also see Force Clang to “perform math early” on constant values on Stack Overflow and LLVM Bug 24226.
You should take the time and visit John Regehr's Safe, Efficient, and Portable Rotate in C/C++. Its kind of anti-climactic. It says once you write the rotate properly in C/C++ (i.e., no undefined behavior), it will no longer be recognized as a rotate, and the rotate instruction won't be generated.
Finally, also see Near constant time rotate that does not violate the standards on Stack Overflow.