Can JIT compilation run faster than compile time template instantiation?

Question

I have recently heard multiple people say that JIT compilation produces really fast code, faster even than any static compiler can produce. I find this hard to believe when it comes to C++ STL-style templated code, but these people (typically from a C#/Java background) insist that this is indeed the case.

My question is thus: what are the type of optimizations that you can make at runtime but not at compile time?

Edit: clarification: I'm more interested in the kind of things that are impossible to do statically rather than the typical case in any one industry.

Lot of variables at play. Always ask for benchmarks and the assumptions being made. Code statically generated to run on a wide variety of related processors may well run slower than code generated based on the information that may be available to JIT. EG: compile code to run on anything back to a 386 and compare it to JIT code that can take the brand-new I7's cache, prediction and other whiz-bang modern features into account. — user4581301, Jan 30 '18 at 01:30
Possible duplicate of https://stackoverflow.com/questions/4516778/when-is-java-faster-than-c-or-when-is-jit-faster-then-precompiled and https://stackoverflow.com/questions/538056/jit-compiler-vs-offline-compilers — Jerry Jeremiah, Jan 30 '18 at 02:46
Not to mention these: https://stackoverflow.com/questions/18760256/is-it-possible-to-get-a-java-program-faster-than-the-same-program-optimized-in https://stackoverflow.com/questions/1984856/java-runtime-performance-vs-native-c-c-code https://stackoverflow.com/questions/5641356/why-is-it-that-bytecode-might-run-faster-than-native-code — Jerry Jeremiah, Jan 30 '18 at 02:51
Java and C++ have different semantics; even for the same code gen you could find cases were Java strict semantics WRT arrays guarantee less possible aliasing (excluding `restrict` in C). — curiousguy, Jan 22 '19 at 19:25

Mark Ransom · Accepted Answer · 2018-01-30T03:08:21.530

3

JIT compilers can measure the likelihood of a conditional jump being taken, and adjust the emitted code accordingly. A static compiler can do this as well, but not automatically; it requires a hint from the programmer.

Obviously this is just one factor among many, but it does indicate that it's possible for JIT to be faster under the right conditions.

edited Jan 30 '18 at 03:08

answered Jan 30 '18 at 02:53

Mark Ransom

271,357
39
345
578

Interesting. Do you have a sense of what the overhead of gathering the statistics to make this description is? – masaers Jan 31 '18 at 01:50
@masaers no sorry I don't. I just remember reading that this is a technique that is actually used. – Mark Ransom Jan 31 '18 at 03:33
No worries, just curious. – masaers Feb 01 '18 at 01:36
Note that some static AOT compilers also support profile-guided optimizations. For example, icc. – recolic Mar 02 '19 at 10:49

score 2 · Answer 2 · answered Jan 30 '18 at 01:36

2

things you can do at runtime

check to see what exotic instructions exist (AMD vs intel,....)
detect cache topology
detect memory size
number of cores

and other things i missed from the list

Does this make things always 10x faster, no. But it certainly offers the opportunity for optimization that is not available at compile time (for widely distributed code; obviously if you know its going to be on only 3 different hardware configs then you can do custom builds etc)

answered Jan 30 '18 at 01:36

pm100

32,399
19
69
124

1

You can do these things in static compilation. – Cpp plus 1 Jan 30 '18 at 02:25
These are more like things that static compilers typically refrain from doing in order to be portable, rather than things that are impossible to do statically, isn't it? – masaers Jan 31 '18 at 01:39
@masaers a static compiler cannot possible know the cache layout on every machine things will run on. How does it know if I am running on 4 cores or 8 (I mean my actual machine, you are compiling on yours) THe only thing it can do is either not care, or generate all possible optimizations and choose at runtime, which will have overhead – pm100 Jan 31 '18 at 01:51
You're assuming that compiling from source is unavailable to the user, which is true in many cases, but far from all. – masaers Jan 31 '18 at 01:58

Cpp plus 1 · Answer 3 · 2020-01-27T15:20:18.113

Contrary to the what answer above claims:

Architecture-specific extensions can easily be used by a static compiler. Visual Studio, for example, has a SIMD-extension option that can be toggled on and off.
Cache Size is usually the same for processors of a given architecture. Intel, for example, usually has a L1 cache size of 4kB, L2 cache size of 32kB, and L3 cache size of 4MB.
Optimizing for memory size would only be necessary if you are for some reason, writing a massive program that can use over 4GB of memory.
This may actually be an optimization in which using a JIT compiler is actually useful. However, you can create more threads than there are cores, meaning that those threads will use separate cores in CPUs with more cores, and simply be threads in CPUs with fewer cores. I also think it's quite safe to assume that a CPU has 4 cores.

Still, even using multi-core optimizations doesn't make using a JIT compiler useful, because a program's installer can check the number of cores available, and install the appropriate version of the program most optimized for that computer's number of cores.

I do not think that JIT compilation results in better performance than static compilation. You can always create multiple versions of your code that are each optimized for a specific device. The only type of optimization that I can think of that can result in JIT code being faster is when you receive input, and whatever code you use to process it can be optimized in such a way as to make the code faster for the most common case (which the JIT compiler might be able to discover), but slower for the rarer case. Even then, you can perform that optimization (the static compiler, however, would not be able to perform this optimization).

For example, let's say that you can perform an optimization on a mathematical algorithm that results in an error for values 1-100, but all higher numbers work with this optimization. You notice that values 1-100 can easily be pre-calculated, so you do this:

switch(num) {
    case 0: {
        //code
    }
    //...until case 100
}

//main calculation code

However, this is inefficient (assuming the switch statement is not compiled to a jump table), since cases 0-100 are rarely entered, as they be found mentally, without the help of a computer. A JIT might be able to discover that this is more efficient (upon seeing that the values in the range 0-100 are rarely entered):

if(num < 101) {
    switch(num) {
        /...same as other code above
    }
}

//main calculation code

In this version of the code, only 1 if is executed if the most common case instead of an average of 50 ifs in the extremely rare case(if the switch statement is implemented as a series of ifs).

Sure. If properly tuned it might be like that - but most people don't produce a dozen individually tuned versions. So JIT can be fater doesn't mean it has to be faster. Have a look at https://stackoverflow.com/a/5641664/2193968 — Jerry Jeremiah, Jan 30 '18 at 03:07
But in many cases it is practical to compile from source on and for the exact machine the code will run on. — masaers, Feb 01 '18 at 01:43

Can JIT compilation run faster than compile time template instantiation?

3 Answers3