Questions tagged [compiler-optimization]

Compiler optimization involves adapting a compiler to reduce run-time or object size or both. This can be accomplished using compiler arguments (i.e. CFLAGS, LDFLAGS), compiler plugins (DEHYDRA for instance) or direct modifications to the compiler (such as modifying source code).

2626 questions
2323
votes
10 answers

Why are elementwise additions much faster in separate loops than in a combined loop?

Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop. const int n = 100000; for (int j = 0; j < n; j++) { a1[j] += b1[j]; c1[j] += d1[j]; } This loop is executed 10,000 times via another outer…
Johannes Gerer
  • 24,320
  • 5
  • 24
  • 33
2194
votes
12 answers

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a, but the call pow(a,6) is not optimized and will actually call the library function pow,…
xis
  • 22,592
  • 8
  • 39
  • 55
1499
votes
11 answers

Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs

I was looking for the fastest way to popcount large arrays of data. I encountered a very weird effect: Changing the loop variable from unsigned to uint64_t made the performance drop by 50% on my PC. The Benchmark #include #include…
gexicide
  • 35,369
  • 19
  • 80
  • 136
956
votes
9 answers

Swift Beta performance: sorting arrays

I was implementing an algorithm in Swift Beta and noticed that the performance was very poor. After digging deeper I realized that one of the bottlenecks was something as simple as sorting arrays. The relevant part is here: let n = 1000000 var x = …
Jukka Suomela
  • 11,423
  • 4
  • 32
  • 45
468
votes
6 answers

Why does GCC generate 15-20% faster code if I optimize for size instead of speed?

I first noticed in 2009 that GCC (at least on my projects and on my machines) have the tendency to generate noticeably faster code if I optimize for size (-Os) instead of speed (-O2 or -O3), and I have been wondering ever since why. I have managed…
Ali
  • 51,545
  • 25
  • 157
  • 246
430
votes
2 answers

Why do we use volatile keyword?

Possible Duplicate: Why does volatile exist? I have never used it but I wonder why people use it? What does it exactly do? I searched the forum, I found it only C# or Java topics.
Nawaz
  • 327,095
  • 105
  • 629
  • 812
342
votes
1 answer

Why does the Rust compiler not optimize code assuming that two mutable references cannot alias?

As far as I know, reference/pointer aliasing can hinder the compiler's ability to generate optimized code, since they must ensure the generated binary behaves correctly in the case where the two references/pointers indeed alias. For instance, in the…
Zhiyao
  • 3,078
  • 2
  • 7
  • 16
300
votes
12 answers

How to compile Tensorflow with SSE4.2 and AVX instructions?

This is the message received from running a script to check if Tensorflow is working: I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:125]…
GabrielChu
  • 5,418
  • 8
  • 23
  • 35
197
votes
2 answers

What is &&& operation in C

#include volatile int i; int main() { int c; for (i = 0; i < 3; i++) { c = i &&& i; printf("%d\n", c); } return 0; } The output of the above program compiled using gcc is 0 1 1 With the -Wall or…
manav m-n
  • 10,236
  • 21
  • 66
  • 95
185
votes
3 answers

Why does GCC generate such radically different assembly for nearly the same C code?

While writing an optimized ftol function I found some very odd behaviour in GCC 4.6.1. Let me show you the code first (for clarity I marked the differences): fast_trunc_one, C: int fast_trunc_one(int i) { int mantissa, exponent, sign, r; …
orlp
  • 98,226
  • 29
  • 187
  • 285
182
votes
3 answers

Why can lambdas be better optimized by the compiler than plain functions?

In his book The C++ Standard Library (Second Edition) Nicolai Josuttis states that lambdas can be better optimized by the compiler than plain functions. In addition, C++ compilers optimize lambdas better than they do ordinary functions. (Page…
Stephan Dollberg
  • 27,667
  • 11
  • 72
  • 104
178
votes
5 answers

How to see which flags -march=native will activate?

I'm compiling my C++ app using GCC 4.3. Instead of manually selecting the optimization flags I'm using -march=native, which in theory should add all optimization flags applicable to the hardware I'm compiling on. But how can I check which flags is…
vartec
  • 118,560
  • 34
  • 206
  • 238
175
votes
4 answers

Can I hint the optimizer by giving the range of an integer?

I am using an int type to store a value. By the semantics of the program, the value always varies in a very small range (0 - 36), and int (not a char) is used only because of the CPU efficiency. It seems like many special arithmetical optimizations…
rolevax
  • 1,560
  • 1
  • 10
  • 21
151
votes
2 answers

Limits of Nat type in Shapeless

In shapeless, the Nat type represents a way to encode natural numbers at a type level. This is used for example for fixed size lists. You can even do calculations on type level, e.g. append a list of N elements to a list of K elements and get back a…
Rüdiger Klaehn
  • 11,945
  • 2
  • 36
  • 55
148
votes
5 answers

Why does the enhanced GCC 6 optimizer break practical C++ code?

GCC 6 has a new optimizer feature: It assumes that this is always not null and optimizes based on that. Value range propagation now assumes that the this pointer of C++ member functions is non-null. This eliminates common null pointer checks but…
boot4life
  • 4,292
  • 5
  • 20
  • 40
1
2 3
99 100