28

I'm running "perf" in the following way:

perf record -a --call-graph -p some_pid

perf report --call-graph --stdio

Then, I see this:

 1.60%     my_binary  my_binary                [.] my_func
           |
           --- my_func
              |          
              |--71.10%-- (nil)
              |          (nil)
              |          
               --28.90%-- 0x17f310000000a

I can't see which functions call my_func(). I see "nil" and "0x17f310000000a" instead. Am I doing something wrong? It is probably not a debug info problem because some symbols are shown while others are not shown.

More info:

  • I'm runnning CentOS 6.2 (kernel 2.6.32-220.4.1).
  • perf rpm - perf-2.6.32-279.5.2.el6.x86_64.
ant
  • 21,609
  • 35
  • 125
  • 176
erezz
  • 281
  • 1
  • 3
  • 3

3 Answers3

28

Make sure you compiled the code with -fno-omit-frame-pointer gcc option.

udalmik
  • 6,805
  • 23
  • 39
Andriy
  • 281
  • 3
  • 3
  • 22
    or try `perf record --call-graph dwarf` (which works without frame-pointer) – maxy Jan 26 '15 at 15:50
  • install debuginfo packages for the code you profile, most of the time you'll need glibc, ie debuginfo-install glibc – Alec Istomin Dec 22 '16 at 21:41
  • So, `perf record --call-graph dwarf` does NOT work (for me), and actively PREVENTS a callgraph from being recorded, even if one compiled with `-fno-omit-frame-pointer` ... it took me many hours & hair-pulling before I realized that I MUST do `perf record --call-graph fp` to get a callgraph! – Linas Aug 18 '20 at 04:43
16

You're almost there, you're missing the -G option (you might need a more recent perf than the one installed on your system):

$ perf report --call-graph --stdio -G

From perf help report:

   -G, --inverted
       alias for inverted caller based call graph.
holygeek
  • 14,207
  • 1
  • 35
  • 46
  • 1
    -G just changes order of call stack printing in `perf report`; if there is no full call stack recorded to `perf.data` at time of `perf record`, `-G` option will not help. Just need to enable frame pointers or dwarf (may be not ported to rh's 2.6.32) to decode frames at time of `record`: http://www.brendangregg.com/perf.html#StackTraces "Omitting frame pointers is an evil compiler optimization that breaks debugger" – osgx Jul 09 '16 at 08:44
  • 2
    Note that `-G` on `perf record` (rather than `report`) selects by cgroup, in case anyone gets confusing errors about cgroups. – Craig Ringer Dec 04 '17 at 05:24
-1

Have you tried profiling with Zoom? It can use perf, a custom driver, or oprofile to collect samples. If you're just interested in looking at one process, try the "Thread Time" configuration.

I'd be interested to know if any of Zoom's options are better/different at getting the function information than stand-alone perf.

federal
  • 594
  • 5
  • 10
  • Haven't tried Zoom. I was hoping to use perf which comes for free. Will Zoom solve this problem? Is it easy to use perf from Zoom? Is it explained anywhere? – erezz Sep 05 '12 at 11:04
  • Zoom will use perf by default for recent Linux distros (kernel 2.6.38 or later). If you're on something between 2.6.32 and 2.6.37, you might have to select the perf driver manually from the pref pages. I've created custom profiling configurations to access performance monitor events, but it doesn't sound like you'd need to do anything fancy. A regular Time Profile should give you the callstack and symbol data that you're looking for. – federal Sep 07 '12 at 15:15
  • Your kernel module doesn't unload properly. \[edit\] For those who try Zoom from this post and can't unload rrnotify, unmount /dev/rrnotify first. \[edit\] Also if you run on a non-English system, export LC_ALL=C before running Zoom; Zoom doesn't handle non-English number formats right. – FeepingCreature Jun 26 '16 at 14:03