Consider the following code:
int g(std::vector<int>&, size_t);
int f(std::vector<int>& v) {
int res = 0;
for (size_t i = 0; i < v.size(); i++)
res += g(v, i);
return res;
}
The compiler cannot optimize away the evaluation of v.size()
within the loop, since it cannot prove that the size won't change inside g
. The assembly generated with GCC 9.2, -O3
, and x64 is:
.L3:
mov rsi, rbx
mov rdi, rbp
add rbx, 1
call g(std::vector<int, std::allocator<int> >&, unsigned long)
add r12d, eax
mov rax, QWORD PTR [rbp+8] // load a poniter
sub rax, QWORD PTR [rbp+0] // subtract another pointetr
sar rax, 2 // result * sizeof(int) => size()
cmp rbx, rax
jb .L3
If we know that g
does not alter v.size()
, we can rewrite the loop as follows:
for (size_t i = 0, e = v.size(); i < e; i++)
res += g(v);
This generates simpler (and thus faster) assembly without those mov
, sub
, and sar
instructions. The value of size()
is simply kept in a register.
I would expect that the same effect could be achieved by making the vector const
(I know it's changing the semantics of the program since g
now cannot alter vector's elements, but this should be irrelevant to the question):
int g(const std::vector<int>&, size_t);
int f(const std::vector<int>& v) {
int res = 0;
for (size_t i = 0; i < v.size(); i++)
res += g(v, i);
return res;
}
Now, the compiler should know that those pointers loaded in each loop iteration cannot change and, therefore, that the result of
mov rax, QWORD PTR [rbp+8]
sub rax, QWORD PTR [rbp+0]
sar rax, 2
is always the same. Despite that, these instructions are present in the generated assembly; live demo is here.
I have also tried Intel 19, Clang 9, and MSVC 19 with the very same results. Since all the mainstream compilers behave in such a uniform way, I wonder whether there is some rule that disallows this kind of optimization, i.e., moving the evaluation of size()
for a constant vector out of the loop.