5

I have this program that takes 2.34 seconds to run, and gprof says it only takes 1.18 seconds. I've read answers elsewhere suggesting that gprof can get it wrong if e.g the program is I/O bound, but this program clearly isn't.

This also happens for a useful program I'm trying to profile. It's not specific to this trivial test case.

(Also in this case gprof says that main() takes more than 100% of the program's running time, which is a pretty stupid bug but not really causing problems for me.)

$ cat test.c
int main() {
    int i;
    for (i=0;i<1000000000;i++);
}

$ gcc test.c -o test

$ time ./test

real    0m2.342s
user    0m2.340s
sys 0m0.000s

$ gcc test.c -o test -pg

$ time ./test

real    0m2.342s
user    0m2.340s
sys 0m0.000s

$ gprof test |head
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
101.33      1.18     1.18                             main

 %         the percentage of the total running time of the
time       program used by this function.
Angus
  • 1,342
  • 10
  • 15
  • I was not able to reproduce this. What are your OS, GCC and gprof versions ? Do you have a profiling-enabled libc version installed? – thkala Nov 30 '10 at 02:27
  • Ubuntu 9.10 (on a dual-core x86_64), gcc 4.4.1, gprof 2.20. I don't know if I have a profiling libc version installed. – Angus Nov 30 '10 at 02:33
  • I did some more investigation. Profiling relies on sampling each time SIGPROF is delivered. gprof wants SIGPROF to be delivered every 10ms. It looks like SIGPROF isn't being sent at the right frequency. If I set the timer for x ms, where 10<=x<20, it gets delivered every 20ms. If 20<=x<30, it gets delivered every 30ms, and so on. Hence the profiler gets signals every 20ms when it expects them every 10ms, and thinks the time is half of what it actually is. I might work out the cause later. I tried it on a different system and the issue didn't occur. – Angus Nov 30 '10 at 23:09

3 Answers3

3

I would recommend dropping gprof and switching to oprofile. Any profiling that inserts instrumentation into your program will inherently affect the performance in ways that might skew or invalidate the results. With oprofile you don't have to build your program with profiling support or get special profiling-enabled libraries; it works by statistical methods, basically sampling the instruction pointer (with kernel assistance) and using the sample counts to estimate how much time was spent in each function.

R.. GitHub STOP HELPING ICE
  • 195,354
  • 31
  • 331
  • 669
  • 1
    I'm told `oprofile` can sample the stack (not just the IP), on wall-clock time (not just CPU time), and can summarize inclusive percent (not just exclusive) at the line level (not just functions). IMO, that's what you need. – Mike Dunlavey Nov 30 '10 at 15:47
2

First of all, profiling a program that completes in 2.3 seconds is a little bit ludicrous. You really need a long-running program to get a good measurement of program hot spots, etc. But I digress...

To answer your first question, time (the command-line utility) times the execution of the whole process (including the profiling instrumentation itself). When you enable profiling in your build, the program writes a gmon.out file containing execution times etc. each time you run the program. That is, the hard work of profiling is done each time you run the program. The profiling instrumentation tries hard to separate out its own impact on the time accounting, and in this case, it seems that the profiling itself accounted for 2.34 - 1.18 = 1.16s of the runtime (as reported by time). The gprof program itself essentially just analyzes and reformats the runtime statistics stored in the gmon.out program. To be clear about this, the real profiling happens during your program's run, not during the gprof run.

Finally, the gprof output answers your second question directly. It samples your program's execution at 1/100 sec. intervals and gives credit for that whole 0.01 s to whatever happened to be executing during the sample. If your program doesn't take an exact multiple of 0.01 seconds to execute, you'll get numbers that don't add up to 100%. Again, it must be emphasized that profiling a program that runs this quickly is quite error prone, and this apparent bug would certainly be mitigated by a longer sampling interval (i.e. runtime). Also, gprof's accounting for its own overhead is imperfect, which might further contribute to the seemingly ludicrous 101.33% number.

It's a statistical process, and it's not perfect. You have to interpret the results with a grain of salt.

Good luck!

Drew Hall
  • 26,700
  • 10
  • 58
  • 79
  • Note that the first run of 'time' is for a build without profiling enabled, showing that the overhead due to profiling is negligible. – Angus Nov 30 '10 at 02:57
  • @Angus: Whoops--I missed that originally. I'm sure there's a good reason for it but it's not apparent to me right now... :) – Drew Hall Nov 30 '10 at 11:00
  • 1
    Yeah. Fortunately, for the purpose of finding code to optimize, perfection of timing is irrelevant. – Mike Dunlavey Nov 30 '10 at 16:25
2

First, to answer your question, gprof does not count blocked time, so if anything else is going on in the machine at "the same time" you will see that discrepancy. Also, if your program does any I/O, that will not be counted either.

gprof is really only useful for a very restricted class of programs. Here is a list of the issues.

Community
  • 1
  • 1
Mike Dunlavey
  • 38,662
  • 12
  • 86
  • 126
  • Most of your comment isn't relevant. It's obvious from the source code and the 'time' output I posted that the execution time is not due to blocking or load from other processes. – Angus Nov 30 '10 at 23:14
  • @Angus: There's the time to load your program and start running it. That adds to the outer total. Are there any other processes also running in your machine? Mine has lots of other junk like virus-scanners & whatnot. Whenever any process is running, the others are blocked, by definition. On Windows, the Task Manager shows them. – Mike Dunlavey Dec 01 '10 at 00:07
  • I'm using Linux. The 'time' command shows the actual CPU usage of the process, as well as the total running time. I posted results of the 'time' command showing that the CPU time used is very close to the total running time, and both are about twice what gprof reports. Also, the time to load and start running such a trivial program is extremely small. – Angus Dec 01 '10 at 00:16
  • @Angus: To get back to your larger question, you have a non-trivial program, and I assume you want to see how you can make it as fast as possible. (If not, sorry to bother you.) If so, then a tool such as Zoom (www.rotateright.com) will be far more useful. Or, as I commented to @R.. I'm told `oprofile` will do the right thing, if you know how to drive it. – Mike Dunlavey Dec 01 '10 at 01:06