I am learning OpenMP, and the following code with nested for loops is giving unexpected results with number of threads > 1. I expect that only the outer loop will be parallelized here and I expect 4 lines of output. I am using gcc 4.8.4.
#pragma omp parallel
{
const int nthreads = omp_get_num_threads();
const int ithread = omp_get_thread_num();
#pragma omp for
for (out = 0; out < 2; out++) {
for (in = 0; in < 2; in++) {
printf("id= %d of %d, out= %d, in= %d\n", ithread, nthreads, out, in);
}
}
}
If I set OMP_NUM_THREADS=1
, I get the expected output with 4 lines:
id= 0 of 1, out= 0, in= 0
id= 0 of 1, out= 0, in= 1
id= 0 of 1, out= 1, in= 0
id= 0 of 1, out= 1, in= 1
But, OMP_NUM_THREADS=2
misses one line of output!
id= 0 of 2, out= 0, in= 0
id= 1 of 2, out= 1, in= 0
id= 0 of 2, out= 0, in= 1
Setting the inner loop to for (in = 0; in < 3; in++)
gives only 4 lines of output instead of expected 6!
id= 0 of 2, out= 0, in= 0
id= 1 of 2, out= 1, in= 0
id= 0 of 2, out= 0, in= 1
id= 1 of 2, out= 1, in= 2
Am I doing something terribly wrong here? Please help me in troubleshooting this. Thanks.