Though I can intuitively get most of the results, I'm having hard time fully understanding the output of the perf report
command especially for what concerns the call graph, so I wrote a stupid test to solve this issue of mine once for all.
The stupid test
I compiled what follows with:
gcc -Wall -pedantic -lm perf-test.c -o perf-test
No aggressive optimizations to avoid inlining and such.
#include <math.h>
#define N 10000000UL
#define USELESSNESS(n) \
do { \
unsigned long i; \
double x = 42; \
for (i = 0; i < (n); i++) x = sin(x); \
} while (0)
void baz()
{
USELESSNESS(N);
}
void bar()
{
USELESSNESS(2 * N);
baz();
}
void foo()
{
USELESSNESS(3 * N);
bar();
baz();
}
int main()
{
foo();
return 0;
}
Flat profiling
perf record ./perf-test
perf report
With these I get:
94,44% perf-test libm-2.19.so [.] __sin_sse2
2,09% perf-test perf-test [.] sin@plt
1,24% perf-test perf-test [.] foo
0,85% perf-test perf-test [.] baz
0,83% perf-test perf-test [.] bar
Which sounds reasonable since the heavy work is actually performed by __sin_sse2
and sin@plt
is probably just a wrapper, while the overhead of my functions take into account just the loop, overall: 3*N
iterations for foo
, 2*N
for the other two.
Hierarchical profiling
perf record -g ./perf-test
perf report -G
perf report
Now the overhead columns that I get are two: Children
(the output is sorted by this one by default) and Self
(the same overhead of the flat profile).
Here is where I start feeling I miss something: regardless of the fact that I use -G
or not I'm unable to explain the hierarchy in terms of "x calls y" or "y is called by x", for example:
without
-G
("y is called by x"):- 94,34% 94,06% perf-test libm-2.19.so [.] __sin_sse2 - __sin_sse2 + 43,67% foo + 41,45% main + 14,88% bar - 37,73% 0,00% perf-test perf-test [.] main main __libc_start_main - 23,41% 1,35% perf-test perf-test [.] foo foo main __libc_start_main - 6,43% 0,83% perf-test perf-test [.] bar bar foo main __libc_start_main - 0,98% 0,98% perf-test perf-test [.] baz - baz + 54,71% foo + 45,29% bar
- Why
__sin_sse2
is called bymain
(indirectly?),foo
andbar
but not bybaz
? - Why functions sometimes have a percent and a hierarchy attached (e.g., the last instance of
baz
) and sometimes not (e.g., the last instance ofbar
)?
- Why
with
-G
("x calls y"):- 94,34% 94,06% perf-test libm-2.19.so [.] __sin_sse2 + __sin_sse2 + __libc_start_main + main - 37,73% 0,00% perf-test perf-test [.] main - main + 62,05% foo + 35,73% __sin_sse2 2,23% sin@plt - 23,41% 1,35% perf-test perf-test [.] foo - foo + 64,40% __sin_sse2 + 29,18% bar + 3,98% sin@plt 2,44% baz __libc_start_main main foo
- How should I interpret the first three entries under
__sin_sse2
? main
callsfoo
and that's ok, but why if it calls__sin_sse2
andsin@plt
(indirectly?) it does not also callbar
andbaz
?- Why do
__libc_start_main
andmain
appear underfoo
? And whyfoo
appears twice?
- How should I interpret the first three entries under
Suspect is that there are two levels of this hierarchy, in which the second actually represents the "x calls y"/"y is called by x" semantics, but I'm tired to guess so I'm asking here. And the documentation doesn't seem to help.
Sorry for the long post but I hope that all this context may help or act as a reference for someone else too.