Questions tagged [perf]

Perf is a profiler tool for Linux 2.6+ based systems.

Perf is a profiling tool for Linux 2.6+ based systems that uses the perf_events kernel interface to provide a command-line interface to underlying hardware, software and kernel performance counters. It abstracts away differences in performance counters implemented across different CPU architectures, allowing consistency across different hardware.

See also the perf wiki.

802 questions
9
votes
2 answers

perf get time elasped with field separator option

I have a program which parses the output of the linux command perf. It requires the use of option -x, (the field separator option. I want to extract elapsed time (not task-time or cpu-clock) using perf. However when I use the -x option, the elapsed…
knightrider
  • 1,971
  • 1
  • 14
  • 27
9
votes
1 answer

perf_event_open Overflow Signal

I want to count the (more or less) exact amount of instructions for some piece of code. Additionally, I want to receive a Signal after a specific amount of instructions passed. For this purpose, I use the overflow signal behaviour provided by…
Dawodo
  • 423
  • 4
  • 15
9
votes
2 answers

Does perf lock profile user space mutexes?

Summary: Does perf lock profile pthread_mutex? Details: The tool perf has an option perf lock. The man page says: You can analyze various lock behaviours and statistics with this perf lock command. 'perf lock record ' records lock…
Gabriel Southern
  • 8,316
  • 11
  • 50
  • 91
9
votes
2 answers

How to come up with a high cache miss rate example?

I'm trying to come up with an example program which would have a high cache-miss rate. I thought I could try accessing a matrix column by column like so: #include int main(void) { int i, j, k; int w = 1000; int h = 1000; …
none
  • 10,753
  • 9
  • 46
  • 81
8
votes
1 answer

How to decrease the time spent on one instruction?

I am trying to optimize a code in C, and it seems that one instruction is taking about 22% of the time. The code is compiled with gcc 8.2.0. Flags are -O3 -DNDEBUG -g, and -Wall -Wextra -Weffc++ -pthread -lrt. 509529.517218 task-clock…
unamourdeswann
  • 425
  • 3
  • 11
8
votes
1 answer

Confusing Caching Behaviour of a Simple C Program

I am experimenting with a program to see if its caching behaviour is consistent with my conceptual understanding. To do this I am using the Perf command: perf stat -e cache-misses ./a.out to record the cache-miss ratio of the following simple C…
DzedCPT
  • 83
  • 3
8
votes
2 answers

What will be the exact code to get count of last level cache misses on Intel Kaby Lake architecture

I read an interesting paper, entitled "A High-Resolution Side-Channel Attack on Last-Level Cache", and wanted to find out the index hash function for my own machine—i.e., Intel Core i7-7500U (Kaby Lake architecture)—following the leads from this…
8
votes
2 answers

Source line numbers in perf call graph?

I'm using perf record -a --call-graph dwarf -p XXX sleep 1 to record some function calls then perf report to view that data, however it would be very helpful if I could also see source line numbers to know exactly where each function call was made.…
Shocker
  • 1,404
  • 11
  • 22
8
votes
2 answers

Is it possible to use Linux Perf profiler inside C++ code?

I would like to measure L1, L2 and L3 Cache hit/miss ratio of some parts of my C++ code. I am not interested to use Perf for my entire application. Can Perf be used as a library inside C++? int main() { ... ... start_profiling() //…
narengi
  • 1,131
  • 2
  • 13
  • 34
8
votes
2 answers

Unknown events in nodejs/v8 flamegraph using perf_events

I try to do some nodejs profiling using Linux perf_events as described by Brendan Gregg here. Workflow is following: run node >0.11.13 with --perf-basic-prof, which creates /tmp/perf-(PID).map file where JavaScript symbol mapping are…
Kamil Z
  • 643
  • 8
  • 19
8
votes
2 answers

Can perf-stat results be generated from a perf.data file?

When I want to generate performance reports using perf-stat and perf-report from the Linux tool suite perf, I run: $ perf record -o my.perf.data myCmd $ perf report -i my.perf.data And: $ perf stat myCmd But that means I run 'myCmd' a second time,…
garious
  • 81
  • 2
7
votes
1 answer

What do the perf record choices of LBR vs DWARF vs fp do?

When I use the perf record on my code, I find three choices for the --call-graph option: lbr (last branch record), dwarf and fp. What is difference between these?
The flash
  • 71
  • 2
7
votes
2 answers

perf power consumption measure: How does it work?

I noticed that perf list now has the option to measure power consumption. You can use it as follows: $ perf stat -e power/energy-cores/ ./a.out Performance counter stats for 'system wide': 8.55 Joules power/energy-cores/ …
user14717
  • 3,854
  • 2
  • 26
  • 61
7
votes
1 answer

Finding threading bottlenecks and optimizing for wall-time with perf

Sampling cpu-cycles with perf record is useful for finding optimization candidates if core-utilization is roughly constant. But for code that has multiple phases differing in parallelism counting cpu-cycles will emphasize heavily parallel phases…
the8472
  • 35,110
  • 4
  • 54
  • 107
7
votes
1 answer

Why does _mm_mfence() produce counts for the ALL_LOADS perf event?

I am testing some of intrinsic operations' behaviors. I got surprised when I noticed that _mm_mfence() issues load instruction from user space, but it does not count in L1 data cache - miss, hit or fill buffer hit. I am using papi's native events…
Ana Khorguani
  • 834
  • 2
  • 12