15

Linux's perf utility is famously used by Brendan Gregg to generate flamegraphs for c/c++, jvm code, nodejs code, etc.

Does the Linux kernel natively understand stack traces? Where can I read more about how a tool is able to introspect into stack traces of processes, even if processes are written in completely different languages?

Shahbaz
  • 9,743
  • 18
  • 51
  • 71
  • well, it's not the matter of the language the process' source is written in, but of the binary that is run. Ultimately the language code is converted to binary (executable files), which is run on the processor. Likely linux' executables are in [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) format. The stack trace is generated for this - binary - code. If the binary has symbol tables describing its' procedures -- you'll have some normal names in the trace. (Brendan Gregg described the issue with missing symbols for Java, corresponding empty parts on the flame graph.) – xealits Jul 09 '16 at 02:20
  • @xealits do you have the link to Brendan's note about java's missing symbols? That will probably fill in the missing pieces for me. – Shahbaz Jul 09 '16 at 04:36
  • xealits, ELF has no stack trace; and sometimes it can be not easy to linux kernel to find full call stack - some frame pointers may be omitted by compiler optimizations. Some Java VMs have magic in stack frame format: http://www.brendangregg.com/perf.html "4.4 Stack Traces. Always compile with frame pointers. Omitting frame pointers is an evil compiler optimization that breaks debuggers, .. Since about the 3.9 kernel, perf_events has supported a workaround for missing frame pointers in user-level stacks: libunwind, which uses dwarf. ... Java may not show full stacks .. due to hotspot on x86" – osgx Jul 09 '16 at 08:40
  • "completely different languages?" - which languages? What is your CPU architecture? – osgx Jul 09 '16 at 09:22
  • some links: ["Java in flames"](http://techblog.netflix.com/2015/07/java-in-flames.html), ["JIT symbols"](http://www.brendangregg.com/perf.html#JIT_Symbols) -- so for Linux Java is that virtual processor (Java Virtual Machine, JVM), the java programs run in JVM, they are not Linux processes and are not seen as such natively. He somehow adds the the visibility of the insides of JVM for Linux, and user's Java programs, their stacks, become visible. @osgx I'm not very savvy in ELF and all these matters, thus I digress. – xealits Jul 09 '16 at 15:12

1 Answers1

28

There is short introduction about stack traces in perf by Gregg: http://www.brendangregg.com/perf.html

4.4 Stack Traces

Always compile with frame pointers. Omitting frame pointers is an evil compiler optimization that breaks debuggers, and sadly, is often the default. Without them, you may see incomplete stacks from perf_events ... There are two ways to fix this: either using dwarf data to unwind the stack, or returning the frame pointers.

Dwarf

Since about the 3.9 kernel, perf_events has supported a workaround for missing frame pointers in user-level stacks: libunwind, which uses dwarf. This can be enabled using "-g dwarf". ... compiler optimizations (-O2), which in this case has omitted the frame pointer. ... recompiling .. with -fno-omit-frame-pointer:

Non C-style languages may have different frame format, or may omit frame pointers too:

4.3. JIT Symbols (Java, Node.js)

Programs that have virtual machines (VMs), like Java's JVM and node's v8, execute their own virtual processor, which has its own way of executing functions and managing stacks. If you profile these using perf_events, you'll see symbols for the VM engine .. perf_events has JIT support to solve this, which requires the VM to maintain a /tmp/perf-PID.map file for symbol translation.

Note that Java may not show full stacks to begin with, due to hotspot on x86 omitting the frame pointer (just like gcc). On newer versions (JDK 8u60+), you can use the -XX:+PreserveFramePointer option to fix this behavior, ...

The Gregg's blog post about Java and stack traces: http://techblog.netflix.com/2015/07/java-in-flames.html ("Fixing Frame Pointers" - fixed in some JDK8 versions and in JDK9 by adding option on program start)

Now, your questions:

How does linux's perf utility understand stack traces?

perf utility basically (in early versions) just parses data returned from linux kernel's subsystem "perf_events" (or sometimes "events"), accessed with syscall perf_event_open. For call stack trace there are options PERF_SAMPLE_CALLCHAIN / PERF_SAMPLE_STACK_USER:

sample_type PERF_SAMPLE_CALLCHAIN Records the callchain (stack backtrace).

          PERF_SAMPLE_STACK_USER (since Linux 3.7)
                 Records the user level stack, allowing stack unwinding.

Does the Linux kernel natively understand stack traces?

It may understand (if implemented) and may not, depending on your cpu architecture. The function of sampling (getting/reading call stack from live process) callchain is defined in architecture-independent part of kernel as __weak with empty body:

http://lxr.free-electrons.com/source/kernel/events/callchain.c?v=4.4#L26

 27 __weak void perf_callchain_kernel(struct perf_callchain_entry *entry,
 28                                   struct pt_regs *regs)
 29 {
 30 }
 31 
 32 __weak void perf_callchain_user(struct perf_callchain_entry *entry,
 33                                 struct pt_regs *regs)
 34 {
 35 }

In 4.4 kernel user-space callchain sampler is redefined in architecture-dependent part of kernel for x86/x86_64, ARC, SPARC, ARM/ARM64, Xtensa, Tilera TILE, PowerPC, Imagination Meta:

http://lxr.free-electrons.com/ident?v=4.4;i=perf_callchain_user

arch/x86/kernel/cpu/perf_event.c, line 2279
arch/arc/kernel/perf_event.c, line 72
arch/sparc/kernel/perf_event.c, line 1829
arch/arm/kernel/perf_callchain.c, line 62
arch/xtensa/kernel/perf_event.c, line 339
arch/tile/kernel/perf_event.c, line 995
arch/arm64/kernel/perf_callchain.c, line 109
arch/powerpc/perf/callchain.c, line 490
arch/metag/kernel/perf_callchain.c, line 59

Reading of call chain from user stack may be not trivial for some architectures and/or for some modes.

What CPU architecture you use? What languages and VM are used?

Where can I read more about how a tool is able to introspect into stack traces of processes, even if processes are written in completely different languages?

You may try gdb and/or debuggers for the language or backtrace function of libc or support of read-only unwinding in libunwind (there is local backtrace example in libunwind, show_backtrace()).

They may have better support of frame parsing / better integration with virtual machine of the language or with unwind info. If gdb (with backtrace command) or other debuggers can't get stack traces from running program, there may be no way of getting stack trace at all.

If they can get call trace, but perf can't (even after recompiling with -fno-omit-frame-pointer for C/C++), it may be possible to add support of such combination of architecture + frame format into perf_events and perf.

There are several blogs with some info about generic backtracing problems and solutions:

Dwarf support for perf_events/perf:

osgx
  • 80,853
  • 42
  • 303
  • 470
  • 1
    And information about how perf programs the profiling interrupt: http://stackoverflow.com/questions/28661430/how-does-a-system-wide-profiler-e-g-perf-correlate-counters-with-instructions/ – osgx Jul 19 '16 at 16:31
  • Dwarf-based stack trace is limited by default and `--call-graph dwarf,81920` option of `perf record` with some larger value may help to get more detailed call stacks. – osgx Mar 05 '20 at 19:29