Is there a compiler hint for GCC to force branch prediction to always go a certain way?

Question

For the Intel architectures, is there a way to instruct the GCC compiler to generate code that always forces branch prediction a particular way in my code? Does the Intel hardware even support this? What about other compilers or hardwares?

I would use this in C++ code where I know the case I wish to run fast and do not care about the slow down when the other branch needs to be taken even when it has recently taken that branch.

for (;;) {
  if (normal) { // How to tell compiler to always branch predict true value?
    doSomethingNormal();
  } else {
    exceptionalCase();
  }
}

As a follow on question for Evdzhan Mustafa, can the hint just specify a hint for the first time the processor encounters the instruction, all subsequent branch prediction, functioning normally?

could also throw an exception if anything becomes abnormal (which is compiler independent) — Shep, May 08 '15 at 19:04
Closely related: [likely()/unlikely() macros in the Linux kernel - how do they work? What's their benefit?](http://stackoverflow.com/q/109710/1068283) — Michael Hampton, May 09 '15 at 16:54

Jack · Answer 1 · 2015-05-08T19:07:02.633

88

GCC supports the function __builtin_expect(long exp, long c) to provide this kind of feature. You can check the documentation here.

Where exp is the condition used and c is the expected value. For example in you case you would want

if (__builtin_expect(normal, 1))

Because of the awkward syntax this is usually used by defining two custom macros like

#define likely(x)    __builtin_expect (!!(x), 1)
#define unlikely(x)  __builtin_expect (!!(x), 0)

just to ease the task.

Mind that:

this is non standard
a compiler/cpu branch predictor are likely more skilled than you in deciding such things so this could be a premature micro-optimization

edited May 08 '15 at 19:07

answered May 08 '15 at 19:01

Jack

125,196
27
216
324

3

Is there a reason that you show a macro and not a `constexpr` function? – Columbo May 08 '15 at 19:16
22

@Columbo: I don't think a `constexpr` function _can_ replace this macro. It has to be in the `if` statement directly I believe. Same reason `assert` could never be a `constexpr` function. – Mooing Duck May 08 '15 at 20:30
1

@MooingDuck I agree, although there are [more reasons for assert](http://stackoverflow.com/q/25285557/1708801). – Shafik Yaghmour May 08 '15 at 23:41
1

@MooingDuck: Have you tried it? Because the last time I did it worked fine as a static inline function, affecting the generated assembly in the expected way (no pun intended). – Nemo May 09 '15 at 05:10
For opaque predicates or values the compiler has no clue how the branch is going to turn out. – usr May 09 '15 at 10:18
8

@Columbo one reason to use a macro would be because this is one of the few places in C or C++ where a macro is *more semantically correct* than a function. The function only appears to work because of optimization (it *is* an optimization: `constexpr` only talks about value semantics, not the inlining of implementation-specific assembly); the straightforward interpretation (no inline) of the code is meaningless. There's no reason at all to use a function for this. – Leushenko May 09 '15 at 12:40
@MooingDuck It appears that compiling with merely `-O1` or greater, GCC produces equivalent assembly for both the [`constexpr` function](http://goo.gl/ilMvES) and [the intrinsic](http://goo.gl/Ev0qiX). Same goes with [Clang](http://goo.gl/nqlK6m). And, really, there is no point in providing branching information when your optimization level is under 1. – Columbo May 09 '15 at 13:03
2

@Leushenko Consider that `__builtin_expect` itself is an optimization hint, so arguing that a method simplifying its use depends on optimization is... not convincing. Also, I didn't add the `constexpr` specifier to make it work in the first place, but to make it work in constant expressions. And yes, there are reasons to use a function. For example, I wouldn't want to pollute my entire namespace with a cute little name such as `likely`. I'd have to use e.g. `LIKELY`, to emphasize that it is a macro and avoid collisions, but that's simply ugly. – Columbo May 09 '15 at 13:10
1

@Leushenko: As Columbo says (indirectly), putting this gadget in a utility namespace is an excellent reason for making it a function. – Nemo May 09 '15 at 13:57
You see these macros [all over the Linux kernel](http://stackoverflow.com/q/109710/1068283), where presumably the kernel hackers _do_ know better than the compiler what is likely or unlikely. – Michael Hampton May 09 '15 at 16:56
Absent PGO, the compiler pretty much has very little information at all about the likelihood of a branch, since it has almost no contextual information. There are various heuristics used, such as a "branch that returns a constant is not likely to be taken because this is a common error-handling pattern", but their use is limited and can be dead wrong. On the other hand, the dynamic branch predictor in the CPU _is_ much more likely to get things right, but that's more or less irrelevant since the code has been generated at that point. The source hints don't interfere with the predictor. – BeeOnRope Jul 29 '17 at 21:28

score 48 · Answer 2 · edited May 23 '17 at 12:10

gcc has long __builtin_expect (long exp, long c) (emphasis mine):

You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:
if (__builtin_expect (x, 0))
   foo ();
indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as
if (__builtin_expect (ptr != NULL, 1))
   foo (*ptr);
when testing pointer or floating-point values.

As the documentation notes you should prefer to use actual profile feedback and this article shows a practical example of this and how it in their case at least ends up being an improvement over using __builtin_expect. Also see How to use profile guided optimizations in g++?.

We can also find a Linux kernel newbies article on the kernal macros likely() and unlikely() which use this feature:

#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

Note the !! used in the macro we can find the explanation for this in Why use !!(condition) instead of (condition)?.

Just because this technique is used in the Linux kernel does not mean it always makes sense to use it. We can see from this question I recently answered difference between the function performance when passing parameter as compile time constant or variable that many hand rolled optimizations techniques don't work in the general case. We need to profile code carefully to understand whether a technique is effective. Many old techniques may not even be relevant with modern compiler optimizations.

Note, although builtins are not portable clang also supports __builtin_expect.

Also on some architectures it may not make a difference.

What is good enough for Linux kernel does not suffice for C++11. — Maxim Egorushkin, May 09 '17 at 14:33
@MaximEgorushkin note, I do not actually recommend its use, in fact the gcc documentation I quote which is my first citation does not even use that technique. I would say the main thrust of my answer is to consider alternatives carefully before going down this route. — Shafik Yaghmour, May 11 '17 at 19:23

score 45 · Answer 3 · edited May 23 '17 at 11:54

45

No, there is not. (At least on modern x86 processors.)

__builtin_expect mentioned in other answers influences the way gcc arranges the assembly code. It does not directly influence the CPU's branch predictor. Of course, there will be indirect effects on branch prediction caused by reordering the code. But on modern x86 processors there is no instruction that tells the CPU "assume this branch is/isn't taken".

See this question for more detail: Intel x86 0x2E/0x3E Prefix Branch Prediction actually used?

To be clear, __builtin_expect and/or the use of -fprofile-arcs can improve the performance of your code, both by giving hints to the branch predictor through code layout (see Performance optimisations of x86-64 assembly - Alignment and branch prediction), and also improving cache behaviour by keeping "unlikely" code away from "likely" code.

edited May 23 '17 at 11:54

Community

1
1

answered May 09 '15 at 04:59

Artelius

45,612
11
85
102

10

This is incorrect. On all modern versions of x86, the default prediction algorithm is to predict that forward branches are not taken and that backward branches are (see https://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts). So by rearranging your code you *can* effectively give a hint to the CPU. This is exactly what GCC does when you use `__builtin_expect`. – Nemo May 09 '15 at 05:08
7

@Nemo, did you read past the first sentence of my answer? Everything you have said is covered by my answer or in the links given. The question asked if you can "force branch prediction to always go a certain way", to which the answer is "no", and I did not feel other answers were clear enough about this. – Artelius May 09 '15 at 05:38
5

OK, I should have read more carefully. It seems to me this answer is technically correct, but useless, since the questioner is obviously looking for `__builtin_expect`. So this should be just a comment. But it is not false, so I have removed my downvote. – Nemo May 09 '15 at 15:12
1

IMO it's not useless; it's a useful clarification of how CPUs and compilers actually work, which might be relevant to performance analysis with/without these options. e.g. you usually can't use `__builtin_expect` to trivially create a test-case that you can measure with `perf stat` that will have a very high branch mispredict rate. It just affects branch *layout*. And BTW, Intel since Sandybridge or at least Haswell does *not* use static prediction much / at all; there's always some prediction in the BHT, whether it's a stale alias or not. https://xania.org/201602/bpu-part-two – Peter Cordes Jan 16 '20 at 02:50
More detail on modern Intel CPUs (lack of) static prediction: [Why did Intel change the static branch prediction mechanism over these years?](https://stackoverflow.com/q/51822731) – Peter Cordes Mar 18 '21 at 06:12

Maxim Egorushkin · Answer 4 · 2020-01-18T18:31:08.717

24

The correct way to define likely/unlikely macros in C++11 is the following:

#define LIKELY(condition) __builtin_expect(static_cast<bool>(condition), 1)
#define UNLIKELY(condition) __builtin_expect(static_cast<bool>(condition), 0)

This method is compatible with all C++ versions, unlike [[likely]], but relies on non-standard extension __builtin_expect.

When these macros defined this way:

#define LIKELY(condition) __builtin_expect(!!(condition), 1)

That may change the meaning of if statements and break the code. Consider the following code:

#include <iostream>

struct A
{
    explicit operator bool() const { return true; }
    operator int() const { return 0; }
};

#define LIKELY(condition) __builtin_expect((condition), 1)

int main() {
    A a;
    if(a)
        std::cout << "if(a) is true\n";
    if(LIKELY(a))
        std::cout << "if(LIKELY(a)) is true\n";
    else
        std::cout << "if(LIKELY(a)) is false\n";
}

And its output:

if(a) is true
if(LIKELY(a)) is false

As you can see, the definition of LIKELY using !! as a cast to bool breaks the semantics of if.

The point here is not that operator int() and operator bool() should be related. Which is good practice.

Rather that using !!(x) instead of static_cast<bool>(x) loses the context for C++11 contextual conversions.

edited Jan 18 '20 at 18:31

answered May 09 '17 at 12:44

Maxim Egorushkin

119,842
14
147
239

Note [contextual conversions came in via a defect in 2012](http://stackoverflow.com/a/25048219/1708801) and even in late 2014 there was still implementation divergence. Actually it looks like the case I linked to still does not work for gcc. – Shafik Yaghmour May 11 '17 at 19:30
@ShafikYaghmour That is an interesting observation with regards to the contextual conversion involved in `switch`, thanks. The contextual conversion involved here is [partucluar to type `bool` and the five specific contexts listed there](http://en.cppreference.com/w/cpp/language/implicit_conversion#Contextual_conversions), which do not include `switch` context. – Maxim Egorushkin May 12 '17 at 09:35
This only affects C++, right? So there's no reason to go and change existing C projects to use `(_Bool)(condition)`, because C doesn't have operator overloading. – Peter Cordes Jul 29 '17 at 14:51
2

In your example, you used just `(condition)`, not `!!(condition)`. Both are `true` after changing that (tested with g++ 7.1). Can you construct an example that actually demonstrates the problem you're talking about when you use `!!` to booleanize? – Peter Cordes Jul 29 '17 at 14:56
@PeterCordes It applies to C++, you are right. One can provide both `operator!()` and `explicit operator bool()` which both return the same value, so that `!!` would return the opposite of the explicit conversion to bool. Such code would probably not pass a code review though, because it doesn't follow the principle of least surprise. – Maxim Egorushkin Jul 29 '17 at 15:02
Ah yes, overloading `operator!` would do it :P – Peter Cordes Jul 29 '17 at 15:03
3

As Peter Cordes pointed out, you say "When these macros [are] defined this way:" and then show a macro using '!!', "may change the meaning of if statements and break the code. Consider the following code:" ... and then you show code that doesn't use '!!' at all - which has known to be broken even before C++11. Please change the answer to show an example where the given macro (using !!) goes wrong. – Carlo Wood Aug 27 '18 at 21:25

Cody Gray · Answer 5 · 2015-05-09T10:07:54.233

19

As the other answers have all adequately suggested, you can use __builtin_expect to give the compiler a hint about how to arrange the assembly code. As the official docs point out, in most cases, the assembler built into your brain will not be as good as the one crafted by the GCC team. It's always best to use actual profile data to optimize your code, rather than guessing.

Along similar lines, but not yet mentioned, is a GCC-specific way to force the compiler to generate code on a "cold" path. This involves the use of the noinline and cold attributes, which do exactly what they sound like they do. These attributes can only be applied to functions, but with C++11, you can declare inline lambda functions and these two attributes can also be applied to lambda functions.

Although this still falls into the general category of a micro-optimization, and thus the standard advice applies—test don't guess—I feel like it is more generally useful than __builtin_expect. Hardly any generations of the x86 processor use branch prediction hints (reference), so the only thing you're going to be able to affect anyway is the order of the assembly code. Since you know what is error-handling or "edge case" code, you can use this annotation to ensure that the compiler won't ever predict a branch to it and will link it away from the "hot" code when optimizing for size.

Sample usage:

void FooTheBar(void* pFoo)
{
    if (pFoo == nullptr)
    {
        // Oh no! A null pointer is an error, but maybe this is a public-facing
        // function, so we have to be prepared for anything. Yet, we don't want
        // the error-handling code to fill up the instruction cache, so we will
        // force it out-of-line and onto a "cold" path.
        [&]() __attribute__((noinline,cold)) {
            HandleError(...);
        }();
    }

    // Do normal stuff
    ⋮
}

Even better, GCC will automatically ignore this in favor of profile feedback when it is available (e.g., when compiling with -fprofile-use).

See the official documentation here: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

edited May 09 '15 at 10:07

answered May 09 '15 at 10:02

Cody Gray

222,280
47
466
543

3

The branch prediction hint prefixes are ignored because they are not needed; you can achieve the exact same effect just by reordering your code. (The default branch prediction algorithm is to guess that backward branches are taken and forward branches are not.) So you can, in effect, give the CPU a hint, and this is what `__builtin_expect` does. It is not at all useless. You are right that the `cold` attribute is also useful, but you underestimate the utility of `__builtin_expect` I think. – Nemo May 10 '15 at 00:08
Modern Intel CPUs don't use static branch prediction. The algorithm you describe, @Nemo, where backwards branches are predicted taken and forward branches are predicted as not-taken was used in earlier processors, and up through the Pentium M or so, but modern designs just basically guess randomly, indexing into their branch tables at where it would *expect* to find information on that branch and using whatever information is there (even though it may be essentially garbage). So branch prediction hints would theoretically be useful, but perhaps not in practice, which is why Intel removed them. – Cody Gray Jun 29 '17 at 11:23
To be clear, the implementation of branch prediction is *extremely* complicated, and space constraints in comments forced me to vastly over-simplify. This would really be an entire answer in and of itself. There may still be vestiges of static branch prediction in modern microarchitectures, like Haswell, but it isn't nearly as simple as it used to be. – Cody Gray Jun 29 '17 at 11:26
Do you have a reference for "modern Intel CPUs don't use static branch prediction"? Intel's own article (https://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts) says otherwise... But that is from 2011 – Nemo Jun 29 '17 at 18:22
Don't really have an official reference, @Nemo. Intel is extremely tight-lipped about branch prediction algorithms used in its chips, treating them as trade secrets. Most of what is known has been figured out by empirical testing. As ever, [Agner Fog's materials](http://www.agner.org/optimize/) are the best resources, but even he says: "The branch predictor appears to have been redesigned in the Haswell, but very little is known about its construction." I can't recall where I first saw the benchmarks demonstrating static BP wasn't used anymore, unfortunately. – Cody Gray Jun 30 '17 at 10:19
[Fog's microarchitecture document](http://www.agner.org/optimize/microarchitecture.pdf) *does* speak a bit on static prediction, but only for P5, P6, Netburst, and PM/Core2. And it appears I misremembered about PM being the last to use static prediction. Fog says that PM/Core2 "do not useuse static prediction. The predictor simply makes a random prediction the first time a branch is seen, depending on what happens to be in the BTB entry that is assigned to the new branch. There is simply a 50% chance of making the right prediction of jump or no jump, but the predicted target is correct." – Cody Gray Jun 30 '17 at 10:21
No update on later microarchitectures available, but no real reason to assume that this has changed since Core 2. Modern microarchitectures are just incremental revisions of Core 2, just as Core 2 itself was an incremental revision of the original P6 design. [This paper also suggests that PM doesn't statically predict](http://www.ece.uah.edu/~milenka/docs/VladimirUzelac.thesis.pdf), but there's very little I've seen published recently. Might be an interesting question, especially if you have some rep to give away as a bounty to encourage someone to test it. – Cody Gray Jun 30 '17 at 10:22
Matt Godbolt has a [pretty interesting series on this topic](https://xania.org/201602/bpu-part-one) – Nemo Jul 05 '17 at 05:07

score 10 · Accepted Answer · edited Mar 17 '21 at 21:59

10

As of C++20 the likely and unlikely attributes should be standardized and are already supported in g++9. So as discussed here, you can write

if (a > b) {
  /* code you expect to run often */
  [[likely]] /* last statement here */
}

e.g. in the following code the else block gets inlined thanks to the [[unlikely]] in the if block

int oftendone( int a, int b );
int rarelydone( int a, int b );
int finaltrafo( int );

int divides( int number, int prime ) {
  int almostreturnvalue;
  if ( ( number % prime ) == 0 ) {
    auto k                         = rarelydone( number, prime );
    auto l                         = rarelydone( number, k );
    [[unlikely]] almostreturnvalue = rarelydone( k, l );
  } else {
    auto a            = oftendone( number, prime );
    almostreturnvalue = oftendone( a, a );
  }
  return finaltrafo( almostreturnvalue );
}

godbolt link comparing the presence/absence of the attribute

edited Mar 17 '21 at 21:59

ksyx

27
1
8

answered Jan 15 '20 at 10:36

pseyfert

2,383
3
17
41

Why use `[[unlikely]]` in `if` vs `[[likely]]` in the `else`? – WilliamKF Jan 16 '20 at 16:37
no reason, just ended up in this constellation after trying around where the attribute needs to go. – pseyfert Jan 17 '20 at 13:10
Pretty cool. Too bad the method is not applicable to older C++ versions. – Maxim Egorushkin Jan 18 '20 at 18:29
Fantastic godbolt link – Lewis Kelsey Apr 10 '20 at 23:32
Note that these don't hint *run-time* branch *prediction* (at least not for most ISAs, because there's literally no mechanism for that, especially on modern x86 where [there is no fallback to static predict-not-taken for forward branches](//stackoverflow.com/q/51822731/224132), and see other answers), so this isn't truly answering the title question. But it is what you actually want: It can be useful to hint the compiler which path is hot, so it can lay out that path to involve fewer taken branches (superscalar front-ends have an easier time with wide contiguous instruction fetches.) – Peter Cordes Mar 18 '21 at 05:21

score 5 · Answer 7 · answered May 08 '15 at 23:22

__builtin_expect can be used to tell the compiler which way you expect a branch to go. This can influence how the code is generated. Typical processors run code faster sequentially. So if you write

if (__builtin_expect (x == 0, 0)) ++count;
if (__builtin_expect (y == 0, 0)) ++count;
if (__builtin_expect (z == 0, 0)) ++count;

the compiler will generate code like

if (x == 0) goto if1;
back1: if (y == 0) goto if2;
back2: if (z == 0) goto if3;
back3: ;
...
if1: ++count; goto back1;
if2: ++count; goto back2;
if3: ++count; goto back3;

If your hint is correct, this will execute the code without any branches actually performed. It will run faster than the normal sequence, where each if statement would branch around the conditional code and would execute three branches.

Newer x86 processors have instructions for branches that are expected to be taken, or for branches that are expected not to be taken (there's an instruction prefix; not sure about the details). Not sure if the processor uses that. It is not very useful, because branch prediction will handle this just fine. So I don't think you can actually influence the branch prediction.

score 2 · Answer 8 · answered Mar 01 '17 at 02:28

With regards to the OP, no, there is no way in GCC to tell the processor to always assume the branch is or isn't taken. What you have is __builtin_expect, which does what others say it does. Furthermore, I think you don't want to tell the processor whether the branch is taken or not always. Today's processors, such as the Intel architecture can recognize fairly complex patterns and adapt effectively.

However, there are times you want to assume control of whether by default a branch is predicted taken or not: When you know the code will be called "cold" with respect of branching statistics.

One concrete example: Exception management code. By definition the management code will happen exceptionally, but perhaps when it occurs maximum performance is desired (there may be a critical error to take care off as soon as possible), hence you may want to control the default prediction.

Another example: You may classify your input and jump into the code that handles the result of your classification. If there are many classifications, the processor may collect statistics but lose them because the same classification does not happen soon enough and the prediction resources are devoted to recently called code. I wish there would be a primitive to tell the processor "please do not devote prediction resources to this code" the way you sometimes can say "do not cache this".

Is there a compiler hint for GCC to force branch prediction to always go a certain way?

8 Answers8

Linked