6

This is related to How to force const propagation through an inline function? Clang has an integrated assembler; and it does not use the system's assembler (which is often GNU AS (GAS)). Non-Clang performed the math early, and everything "just worked".

I say "early" because @n.m. objected to describing it as "math performed by the preprocessor." But the idea is the value is known at compile time, and it should be evaluated early, like when the preprocessor evaluates a #if (X % 32 == 0).

Below, Clang 3.6 is complaining about violating a constraint. It appears the constant is not being propagated throughout:

$ export CXX=/usr/local/bin/clang++
$ $CXX --version
clang version 3.6.0 (tags/RELEASE_360/final)
Target: x86_64-apple-darwin12.6.0
...
$ make
/usr/local/bin/clang++ -DNDEBUG -g2 -O3 -Wall -fPIC -arch i386 -arch x86_64 -pipe -Wno-tautological-compare -c integer.cpp
In file included from integer.cpp:8:
In file included from ./integer.h:7:
In file included from ./secblock.h:7:
./misc.h:941:44: error: constraint 'I' expects an integer constant expression
        __asm__ ("rolb %1, %0" : "+mq" (x) : "I" ((unsigned char)(y%8)));
                                                  ^~~~~~~~~~~~~~~~~~~~
./misc.h:951:44: error: constraint 'I' expects an integer constant expression
...

The functions above are inlined template specializations:

template<> inline byte rotrFixed<byte>(byte x, unsigned int y)
{
    // The I constraint ensures we use the immediate-8 variant of the
    // shift amount y. However, y must be in [0, 31] inclusive. We
    // rely on the preprocessor to propoagte the constant and perform
    // the modular reduction so the assembler generates the instruction.
    __asm__ ("rorb %1, %0" : "+mq" (x) : "I" ((unsigned char)(y%8)));
    return x;
}

They are being invoked with a const value, so the rotate amount is known at compile time. A typical caller might look like:

unsigned int x1 =  rotrFixed<byte>(1, 4);
unsigned int x2 =  rotrFixed<byte>(1, 32);

None of these [questionable] tricks would be required if GCC or Clang provided an intrinsic to perform the rotate in near constant time. I'd even settle for "perform the rotate" since they don't even have that.

What is the trick needed to get Clang to resume performing the preprocessing of the const value?


Astute readers will recognize rotrFixed<byte>(1, 32) could be undefined behavior if using a traditional C/C++ rotate. So we drop into assembly to avoid the C/C++ limitations and enjoy the 1 instruction speedup.

Curious reader may wonder why we would do this. The cryptographers call out the specs, and sometimes those specs are not sympathetic to the underlying hardware or standard bodies. Rather than changing the cryptographer's specification, we attempt to provide it verbatim to make audits easier.


A bug is opened for this issue: LLVM Bug 24226 - Constant not propagated into inline assembly, results in "constraint 'I' expects an integer constant expression".

I don't know what guarantees Clang makes, but I know the compiler and integrated assembler claim to be compatible with GCC and GNU's assembler. And GCC and GAS provide the propagation of the constant value.

Community
  • 1
  • 1
jww
  • 83,594
  • 69
  • 338
  • 732
  • 1
    You keep talking about the preprocessor arithmetic, but there are no #define'd constants anywhere in your code. – n. 'pronouns' m. Jul 23 '15 at 04:27
  • You can use the system assembler via `-no-integrated-as`. – Thomas Jul 23 '15 at 05:41
  • 2
    If `y` is known at comoile time, why not make it a template parameter? – n. 'pronouns' m. Jul 23 '15 at 06:08
  • @n.m. - two reasons. First, that changes the ABI. (We are paying for past sins). Second, it doe not necessarily work for Clang/LLVM. For (2), see Sean's comment at [LLVM Bug 24226](https://llvm.org/bugs/show_bug.cgi?id=24226#c1). (Sean is one of the LLVM maintainers). – jww Jul 23 '15 at 06:27
  • 1
    It is inline and a template anyway. No ABI concerns here. API maybe. – n. 'pronouns' m. Jul 23 '15 at 06:35
  • I tend to agree with the mailing list reply - I don't think this can be considered a bug. The compiler is free to fold the constant, but it's not strictly a compile-time constant expression. – Brett Hale Jul 23 '15 at 18:19
  • @Brett - this is where my lack of experience or lack of the finer details breaks down.... When I type the number one in the source code - `1` - it never changes. I don't think things can get any more constant than that. (It might be turned on its head in the quantum world, but I'll be dead long before quantum computers become commodity items). – jww Jul 23 '15 at 18:41
  • I see you used just `4` and `32` for your sample `y` values. Is it true you just have a small set of such values? You could make a template function, and then also have a non-template function with a `switch` that selects the correct template. If, for API (ABI?) reasons, you *really* need `y` to be a non-template parameter to the function, then this should work. – Aaron McDaid Jul 24 '15 at 09:48

2 Answers2

2

Since you seem to be out of luck trying to force a constant evaluation due to design decisions, the ror r/m8, cl form might be a good compromise:

__asm__ ("rorb %b1, %b0" : "+q,m" (x) : "c,c" (y) : "cc");

The multiple alternative constraint syntax is to 'promote' register use over memory use due to an issue with clang, covered here. I don't know if this issue has been resolved in later versions. gcc tends to be better at constraint matching and avoiding spills.

This does require loading (y) into the rcx/ecx/cl register, but the compiler can probably hide it behind another latency. Furthermore, there are no range issues for (y). rorb effectively uses (%cl % 8). The "cc" clobber isn't required.


If an expression is constant, both gcc and clang can use __builtin_constant_p :

if (__builtin_constant_p(y))
    __asm__("rorb %1, %b0" : "+q,m" (x) : "N,N" ((unsigned char) y) : "cc");
else
    ... non-constant (y) ...

or as alluded to in the mailing list:

if (__builtin_constant_p(y))
{
    if ((y &= 0x7) != 0)
        x = (x >> y) | (x << (8 - y)); /* gcc generates rotate. */
}
Community
  • 1
  • 1
Brett Hale
  • 20,019
  • 2
  • 47
  • 81
  • *"`rorb effectively uses (%cl % 8)`"* - I knew this could be taken advantage of, but I did not know how to express it... Thanks. – jww Jul 23 '15 at 18:38
  • 1
    @jww - BTW, for an 8-bit unsigned immediate, you could use the `"N"` constraint, rather than `"I"`, and let `ror` handle the range modulo 8. – Brett Hale Jul 23 '15 at 18:44
0

If 'N' contraint for 8 bit , then how about 16/32/64?

raven02
  • 47
  • 1
  • 4