2

I'm trying to profile a 64 bit OpenGL application using the MSVS 2013 profiler (CPU sampling). According to Sysinternals Process Explorer, my application seems to use only 60% of GPU ressources but 100% of a CPU core (since it's only single-threaded for the time being), so the CPU code seems to be the bottleneck. Now I tried to figure out what the hotspots are, in order to optimize/parallelize my code.

However, the profiling results tell that 98% of the time is spent by nvogl64v.dll -- more most notably, 75% within gdi32.dll, 6% in KernelBase.dll.

I have no clue what to do with this information and what optimiziations in my code could help. What conclusion can I draw from that? I'm using freeglut for windowing, the profiler tells negligible 2% is spent in freeglut.dll, thus in my idle and display functions, so I'm not sure if any changes in my update and draw loops would have any effect.

Any hints?

EDIT: I now figured out how to load according debugging symobols from MS Symbol Servers, now I can go one step deeper into the callstack: Turns out, the portion of gdi32.dll is spent mainly in NtGdiDdDDIEscape (55%) and NtGdiDdDDIGetDeviceState (17%), while KernelBase.dll portion is due to SwitchToThread

genpfault
  • 47,669
  • 9
  • 68
  • 119
iko79
  • 915
  • 7
  • 17
  • As many people do, you are treating it as a process of measuring various things and puzzling out what's going on. There's a [*more direct way to do it.*](http://stackoverflow.com/a/378024/23771) – Mike Dunlavey Jul 10 '15 at 12:14
  • 1
    @iko79: Do the figures remain the same if you add a `::Sleep(1)` just after `…SwapBuffers`? Unfortunately time spent when waiting for V-Sync is reported quite unfairly; it's reported as consumed CPU time, while actually it can preempted and other threads can use the freed up cycles. – datenwolf Jul 10 '15 at 16:41
  • Thanks for the comments. @MikeDunlavey: As far as I understand, the MSVS sampling profiler is doing exactly that. – iko79 Jul 13 '15 at 08:44
  • @datenwolf: actually, no, this doesn't change much. However, it still seems like waiting for v-sync is the issue, since turning it off resolves the problem. Also, some manual timing measurements showed that by far most of the time is indeed spent in SwapBuffers. Thanks! – iko79 Jul 13 '15 at 08:50
  • 1
    @iko79: Sorry, but it isn't. Profiling has a front end, and a back end. The front end needs to sample the stack, on a wall-clock time interrupt (not CPU-time). The back end needs to allow you, the programmer, to actually see and concentrate on a small number of those samples, not summarize a large number. Speed problems in software are not like in baseball. You're looking for large wastage requiring insight, not small percent differences requiring lots of samples. Example, 30% of time doing something unnecessary: 7 samples expose it, on average. – Mike Dunlavey Jul 13 '15 at 10:39
  • @MikeDunlavey: Okay, thnx. I'm really not an expert in profiling -- I thought, CPU sampling is exactly what you described in your other post. Anyways, recent findings hint towards a fillrate-problem, which kind of confuses me still, since the GPU is only at 60% as I said. Maybe it's best to move over to the OpenGL forum for that since it seems like this will become more of a conversation than a specific question... – iko79 Jul 13 '15 at 11:37

0 Answers0