1

I am learning OpenMP, and the following code with nested for loops is giving unexpected results with number of threads > 1. I expect that only the outer loop will be parallelized here and I expect 4 lines of output. I am using gcc 4.8.4.

#pragma omp parallel
{
      const int nthreads = omp_get_num_threads();
      const int ithread = omp_get_thread_num();

      #pragma omp for
      for (out = 0; out < 2; out++) {
        for (in = 0; in < 2; in++) {
          printf("id= %d of %d, out= %d, in= %d\n", ithread, nthreads, out, in);
        }
      }
}

If I set OMP_NUM_THREADS=1, I get the expected output with 4 lines:

id= 0 of 1, out= 0, in= 0
id= 0 of 1, out= 0, in= 1
id= 0 of 1, out= 1, in= 0
id= 0 of 1, out= 1, in= 1

But, OMP_NUM_THREADS=2 misses one line of output!

id= 0 of 2, out= 0, in= 0
id= 1 of 2, out= 1, in= 0
id= 0 of 2, out= 0, in= 1

Setting the inner loop to for (in = 0; in < 3; in++) gives only 4 lines of output instead of expected 6!

id= 0 of 2, out= 0, in= 0
id= 1 of 2, out= 1, in= 0
id= 0 of 2, out= 0, in= 1
id= 1 of 2, out= 1, in= 2

Am I doing something terribly wrong here? Please help me in troubleshooting this. Thanks.

2 Answers2

0

It looks as if your inner loop variable is declared before the parallel block begins. That means it's implicitly shared between all threads. Perhaps you are trying to parallelise the 2D loop, but what you have tried will not achieve that; instead, each inner loop will be conflicting with the other running inner loops in a non-deterministic way.

If you want to parallelise a 2D loop, you must use a single loop over all elements then manually calculate indexes, e.g. for(int i=0;i<2*2;i++){ int out=i/2; int in=i%2; ... }. Now you can make the loop over i parallel with no problems. as High Performance Mark points out, there is a dedicated construct for this: OpenMP and nested loops?

If you're ok with only parallelising a single dimension (which is usually enough), you either need to declare the inner variables within the parallel loop:

int out;
#pragma omp parallel for
for(out=0;out<2;out++){
    int in;
    for(in=0;in<2;in++){
        // action
    }
}

or explicitly mark the inner variables private:

int in, out;
#pragma omp parallel for private(in)
for(out=0;out<2;out++){
    for(in=0;in<2;in++){
        // action
    }
}

These both make the outer loop parallel, and the inner loop private to each thread, preventing the current state of one thread changing the state of another. The second style is useful if you are in c pre-c99, but otherwise I'd suggest just moving the declaration inside the loop is better for showing intent.

This question contains some useful notes on setting private variables: OpenMP: are local variables automatically private?


Finally, you should avoid printing directly from multiple threads to the same output; the thread safety of those functions is ambiguous (though specifically on Linux it seems to be better defined). It is better to abstract the printing into a thread safe buffer and use a single thread to write the buffer.

Community
  • 1
  • 1
Dave
  • 36,791
  • 8
  • 53
  • 96
  • Thanks! private(in) solves the problem. I was under the impression that all the variables are private by default. I should have known better. Thanks for the suggestion about "thread safe buffer". But I don't know how to use this. Any guideline would be useful. – Suman Chakrabarty Jul 18 '15 at 11:36
0

I see two problems:

  • I do not see what the first omp parallel statement is for, it should be removed
  • in should be declared private (so that each thread has its own copy)

    #pragma omp parallel for private(in) /* each thread needs its own copy of in */
      for (out = 0; out < 2; out++) {
        for (in = 0; in < 2; in++) {
          printf("id= %d of %d, out= %d, in= %d\n", ithread, nthreads, out, in);
        }
      }
    

    }

BrunoLevy
  • 2,104
  • 10
  • 28