I would like to have a general understanding of when I can expect a compiler to vectorize a loop and when it is worth for me to unroll the loop to help it decides to use vectorization.
I understand the details are very important (what compiler, what compilation options, what architecture, how do I write the code in the loop, etc), but I wonder if there are some general guidelines for modern compilers.
I will be more specific giving an example with a simple loop (the code is not supposed to compute anything useful):
double *A,*B; // two arrays
int delay = something
[...]
double numer = 0, denomB = 0, denomA = 0;
for (int idxA = 0; idxA < Asize; idxA++)
{
int idxB = idxA + (Bsize-Asize)/2 + delay;
numer += A[idxA] * B[idxB];
denomA += A[idxA] * A[idxA];
denomB += B[idxB] * B[idxB];
}
Can I expect a compiler to vectorize the loop or would it be useful to rewrite the code like the following?
for ( int idxA = 0; idxA < Asize; idxA+=4 )
{
int idxB = idxA + (Bsize-Asize)/2 + delay;
numer += A[idxA] * B[idxB];
denomA += A[idxA] * A[idxA];
denomB += B[idxB] * B[idxB];
numer += A[idxA+1] * B[idxB+1];
denomA += A[idxA+1] * A[idxA+1];
denomB += B[idxB+1] * B[idxB+1];
numer += A[idxA+2] * B[idxB+2];
denomA += A[idxA+2] * A[idxA+2];
denomB += B[idxB+2] * B[idxB+2];
numer += A[idxA+3] * B[idxB+3];
denomA += A[idxA+3] * A[idxA+3];
denomB += B[idxB+3] * B[idxB+3];
}