1

On linux, AMD 8-core processor, using g++ 4 7.1.

This is - for me - a headbanger. This following code was working perfectly, and for some reason stopped parallelizing. I added the omp_get_num_procs(), and it prints 8 processors. I checked the compilaton, and -fopenmp is present as option both linking and compiling. No compilation/link error message. I checked if any environment variables were defined (OMP_xxx) - there were none.

Are there other - external - factors that could influence?

#pragma omp parallel
{
  lightray ray;
  rgba L;
  printf("Max nr processors: %d\n", omp_get_num_procs());

  #pragma omp for schedule(dynamic)
  for (int xy = 0; xy < xy_range; xy++) {
    int x = x_from + (xy % x_width);
    int y = y_from + (xy / x_width);
    ray = cam->get_ray_at(x, y);
    L = trace_ray(ray, 0, cam->inter);
    cam->set_pixel(x, y, L);
  }
}
dtime = omp_get_wtime() - dtime;
printf("time %f\n", dtime);
}

EDIT: I think I've found something here... The command line for g++ generated by Anjuta contains this:

-DPACKAGE_LOCALE_DIR=\""/usr/local/share/locale"\" -DPACKAGE_SRC_DIR=\"".. -fopenmp  . "\" 

The PACKAGE_SRC_DIR definition seems to 'include' the -fopenmp flag, which would hide it from g++. Haven't found the cause yet...

jcoppens
  • 5,007
  • 6
  • 23
  • 38
  • Where is `xy_range` defined? – Richard Nov 19 '13 at 15:19
  • `rgba L` and `lightray ray` create an interdependencey between iterations of the loop that could be throwing the compiler off, do they need to be defined outside the loop? – Mgetz Nov 19 '13 at 15:21
  • int xy_range; (the value is constant - no conflict. About the rgba and lightray variables, I've tried both inside and outside - no difference, and this was working before, just like it is now. Thanks though! – jcoppens Nov 19 '13 at 15:33
  • `and this was working before, just like it is now.` wait, so is it working or not? – stellarossa Nov 19 '13 at 15:39
  • No, it doesn't parallelize anymore. Before (a couple of months ago) it worked perfectly, occupying all 8 cores. – jcoppens Nov 19 '13 at 15:41

2 Answers2

0

Try rewriting it this way:

lightray ray;
rgba L;
printf("Max nr processors: %d\n", omp_get_num_procs());

#pragma omp parallel for schedule(dynamic) private(ray,L)
for (int xy = 0; xy < xy_range; xy++) {
  int x = x_from + (xy % x_width);
  int y = y_from + (xy / x_width);
  ray = cam->get_ray_at(x, y);
  L = trace_ray(ray, 0, cam->inter);
  cam->set_pixel(x, y, L);
}
dtime = omp_get_wtime() - dtime;
printf("time %f\n", dtime);

That way you introduce ray and L as being variables specific to each of the threads tag-teaming the loop. Since variables defined outside of a parallel region are shared between threads by default, your current implementation is munging these two variables.

Also, omp_get_num_procs() "Returns the number of processors available to the program." according to the OpenMP API 3.1 C/C++ Syntax Quick Reference Card - it therefore does not necessarily tell you how many threads are actually being used in a region. For that you may want omp_get_num_threads() or omp_get_thread_num()

Richard
  • 44,865
  • 24
  • 144
  • 216
  • Running it now, but doesn't seem to make any difference. Still occupies only 1 core. I really feel there is something external (to may program) which doesn't allow the occupying all cores. Thanks though! – jcoppens Nov 19 '13 at 15:49
  • I added num_threads(8) to the #pragma omp parallel, and still get omp_get_num_threads: 1. So somethings _is_ limiting... – jcoppens Nov 19 '13 at 15:52
  • And you are compiling with `-fopenmp`? – Richard Nov 19 '13 at 17:00
  • I thought so. I detected a problem with the -fopenmp though (see edit in original message). – jcoppens Nov 19 '13 at 17:19
  • @Richard, what? Vairables defined inside a parallel region are private by default not shared. The OP's code and your's should do the same thing. – Z boson Dec 05 '13 at 07:56
  • @Zboson, I've clarified my statement. The variables in question were defined outside of the parallel region. – Richard Dec 05 '13 at 14:05
  • @Richard, the OP's variables `ray` and `L` are defined inside the parallel region and are by defintion private. – Z boson Dec 05 '13 at 14:46
0

It seems to have been a problem external to the program. I did change IDE versions (Anjuta). Anjuta is very dependent on pkg-config. OpemMP doesn't have pkg-config .pc files, so I made one for the libgomp library. I added -lgomp to Libs: which went fine, and added -fopenmp to both Libs: and Cflags: which didn't go well.

For some reason, -fopenmp was added into a command line parameter called -DPACKAGE_SRC_DIR (inside its quoted value - see edit in original message) and as such was ignored by the linker and compiler. I'll ask about this on the Anjuta forum.

So, the solution was to remove it from the .pc file, and add it manually to the project parameters as 'CXXFLAGS=-fopenmp' 'LDFLAGS=-fopenmp' (I wanted to avoid this as surely next time I'll forget to do it :)

Anyway, it works like this. Thanks for the suggestions.

jcoppens
  • 5,007
  • 6
  • 23
  • 38