Questions tagged [perf]

Perf is a profiler tool for Linux 2.6+ based systems.

Perf is a profiling tool for Linux 2.6+ based systems that uses the perf_events kernel interface to provide a command-line interface to underlying hardware, software and kernel performance counters. It abstracts away differences in performance counters implemented across different CPU architectures, allowing consistency across different hardware.

See also the perf wiki.

802 questions
18
votes
3 answers

Profiling sleep times with perf

I was looking for a way to find out where my program spends time. I read the perf tutorial and tried to profile sleep times as it is described there. I wrote the simplest possible program to profile: #include int main() { sleep(10); …
Pavel Davydov
  • 2,799
  • 1
  • 25
  • 39
18
votes
2 answers

Analyzing cause of performance regression with different kernel version

I have come across a strange performance regression from Linux kernel 3.11 to 3.12 on x86_64 systems. Running Mark Stock's Radiance benchmark on Fedora 20, 3.12 is noticeably slower. Nothing else is changed - identical binary, identical glibc - I…
Chris
  • 3,953
  • 29
  • 35
17
votes
0 answers

On Skylake (SKL) why are there L2 writebacks in a read-only workload that exceeds the L3 size?

Consider the following simple code: #include #include #include #include #include int cpu_ms() { return (int)(clock() * 1000 / CLOCKS_PER_SEC); } int main(int argc, char** argv) { if (argc <…
BeeOnRope
  • 51,419
  • 13
  • 149
  • 309
17
votes
2 answers

perf enable demangling of callgraph

How do I enable C++ demangling for the perf callgraph? It seems to demangle symbols when I go into annotate mode, but not in the main callgraph. Sample code (using Google Benchmark): #include #include static…
helloworld922
  • 10,197
  • 3
  • 43
  • 80
17
votes
2 answers

How to use linux `perf` tool to generate "Off-CPU" profile

Brendan D. Gregg (author of DTrace book) has interesting variant of profiling: the "Off-CPU" profiling (and Off-CPU Flame Graph; slides 2013, p112-137) to see, where the thread or application were blocked (was not executed by CPU, but waiting for…
osgx
  • 80,853
  • 42
  • 303
  • 470
16
votes
2 answers

Thread Utilization profiling on linux

Linux perf-tools are great for finding hotspots in CPU cycles and optimizing those hotspots. But once some parts are parallelized it becomes difficult to spot the sequential parts since they take up significant wall time but not necessarily many CPU…
the8472
  • 35,110
  • 4
  • 54
  • 107
16
votes
3 answers

perf.data file has no samples

I am using perf 3.0.4 on ubuntu 11.10. Its record command works well and displays on terminal 256 samples collected. But when I make use of perf report , it gives me the following error: perf.data file has no samples I searched a lot for the…
Xara
  • 7,568
  • 14
  • 47
  • 79
15
votes
2 answers

Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths

I was playing with the code in this answer, slightly modifying it: BITS 64 GLOBAL _start SECTION .text _start: mov ecx, 1000000 .loop: ;T is a symbol defined with the CLI (-DT=...) TIMES T imul eax, eax lfence TIMES T imul edx, edx dec…
Margaret Bloom
  • 33,863
  • 5
  • 53
  • 91
15
votes
1 answer

What is lockstep sampling?

I have seen this term in several posts about profiling applications but I don't understand what it actually means and how it affects profiling results. I have seen it here for dtrace: The rate is also increased to 199 Hertz, as capturing kernel…
ks1322
  • 29,461
  • 12
  • 91
  • 140
15
votes
2 answers

Use perf inside a docker container without --privileged

I am trying to use the perf tool inside a Docker container to record a given command. kernel.perf_event_paranoid is set to 1, but the container behaves just as if it were 2, when I don't put the --privileged flag. I could use --privileged, but the…
Fred Tingaud
  • 480
  • 3
  • 11
15
votes
1 answer

How does linux's perf utility understand stack traces?

Linux's perf utility is famously used by Brendan Gregg to generate flamegraphs for c/c++, jvm code, nodejs code, etc. Does the Linux kernel natively understand stack traces? Where can I read more about how a tool is able to introspect into stack…
Shahbaz
  • 9,743
  • 18
  • 51
  • 71
15
votes
2 answers

Is there a way to set kptr_restrict to 0?

I am currently having trouble running linux perf, mostly because /proc/sys/kernel/kptr_restrict is currently set to 1. However, if I try to /proc/sys/kernel/kptr_restrict by echoing 0 to it as follows... echo 0 > /proc/sys/kernel/kptr_restrict I…
jab
  • 4,953
  • 8
  • 45
  • 75
15
votes
2 answers

Can't add perf probe for C++ methods

I'm trying to add a perf probe for a C++ method in my library, but I keep getting the following: $ perf probe --exec=/path/to/file --add='my::Own::Method' Semantic error :There is non-digit char in line number. I've listed the available functions…
Trevor Norris
  • 17,761
  • 3
  • 24
  • 26
14
votes
4 answers

How to profile time spent in memory access in C/C++ applications?

Total Time spent by a function in an application can be broadly divided in to two components: Time spent on actual computation (Tcomp) Time spent on memory accesses (Tmem) Typically profilers provide an estimate of the total time spent by a…
Imran
  • 583
  • 3
  • 21
14
votes
1 answer

Why does Perf and Papi give different values for L3 cache references and misses?

I am working on a project where we have to implement an algorithm that is proven in theory to be cache friendly. In simple terms, if N is the input and B is the number of elements that get transferred between the cache and the RAM every time we have…
jsguy
  • 1,789
  • 1
  • 18
  • 34
1
2
3
53 54