Questions tagged [intel-pmu]

Questions related to the use of the Intel Performance Management Unit, which provides performance counters related to the performance of currently executing code.

The Intel performance management unit provides performance counters which track performance related metrics for the currently executing code.

They are useful while profiling code, and are supported by Intel's VTune, Linux's perf command and the Windows Performance Toolkit.

The counters and the details of how to program them vary by CPU architecture and the details are available in Chapter 18 and 19 of the Intel-64 and IA-32 Architectures Software Developer Manual, Volume 3.

65 questions
4
votes
1 answer

How does one enable Intel Processor Tracing (IPT) in a virtualized environment?

I am attempting to run Alex Ionescu's WinIPT interface in a virtual machine, and having no success. (This is a Windows 10 Pro host running a Windows 10 VM and both are the 18363 update) I have successfully built and run Intel's driver as well as…
echosys
  • 41
  • 3
4
votes
2 answers

Why do newer Intel CPUs not suppert performance counter for stalled-cycles-backend?

I'm fighting memory latency using memory prefetching. Some (older) CPUs from Intel support performance counters for counting the cycles a CPU wasted with waiting for memory (stalled-cycles-backend), e.g. Intels E5-2690. On newer CPUs (Gold 6230 and…
jagemue
  • 351
  • 2
  • 14
4
votes
0 answers

Get the performance monitoring interrupt on Qemu-Kvm

I have a situation with catching the performance monitoring interrupt (PMI - especially instruction counter) on qemu-kvm. The code below works fine on real machine (Intel Core TM i5-4300U) but on qemu-kvm (qemu-system-x86_64 -cpu host), I do not see…
Mahouk
  • 753
  • 7
  • 23
4
votes
1 answer

How to Configure and Sample Intel Performance Counters In-Process

In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system): results[] = ... for (iteration = 0; iteration < num_iterations; iteration++) { pctr_start = sample_pctr(); …
Edd Barrett
  • 2,751
  • 2
  • 23
  • 37
3
votes
1 answer

Performance Counter for DRAM Per-Rank Memory Access

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. I need to retrieve the number of accesses to each DRAM rank, over time, to estimate its power consumption. Based on page 261 of the chipset documentation (i.e., Datasheet,…
TheAhmad
  • 700
  • 1
  • 5
  • 17
3
votes
1 answer

Difference Between mem_load_uops_retired.l3_miss and offcore_response.demand_data_rd.l3_miss.local_dram Events

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. AFAIK, mem_load_uops_retired.l3_miss, counts the number of DRAM demand (i.e., non-prefetch) data read accesses. offcore_response.demand_data_rd.l3_miss.local_dram, as its name…
TheAhmad
  • 700
  • 1
  • 5
  • 17
3
votes
1 answer

PMC to count if software prefetch hit L1 cache

I am trying to find a PMC (Performance Monitoring Counter) that will display the amount of times that a prefetcht0 instruction hits L1 dcache (or misses). icelake-client: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz I am trying to make this fine grain…
Noah
  • 912
  • 1
  • 4
  • 9
3
votes
3 answers

Profiling Cache hit rate of a function of C program

I want to get cache hit rate for a specific function of a C/C++ program (foo) running on a Linux machine. I am using gcc and no compiler optimization. With perf I can get hit rates for the entire program using the following command. perf stat -e…
3
votes
1 answer

How to use rdpmc instruction for counting L1d cache miss?

I am wondering is there any single event that can capture the L1D cache misses. I tried to capture L1d cache miss by measuring latency to access specific memory with rdtsc at the beginning. On my setting, if the L1d cache miss happens, it should hit…
ruach
  • 1,109
  • 8
  • 17
3
votes
2 answers

Perf stat equivalent for Mac OS?

Is there a perf stat equivalent on Mac OS? I would like to do the same thing for a CLI command and googling is not yielding anything.
stk1234
  • 622
  • 3
  • 8
  • 20
3
votes
1 answer

How can I read performance counters from the kernel?

I have been using the Linux perf tool in the user space. I want to write code that reads performance counters for a thread every time it does a context switch. The steps required are: 1) Get a mechanism to read the performance counter registers. 2)…
3
votes
1 answer

How does perf use the offcore events?

Some built-in perf events are mapped to offcore events. For example, LLC-loads and LLC-load-misses are mapped to OFFCORE_RESPONSE. events. This can be easily determined as discussed in here. However, these offcore events require writing certain…
Hadi Brais
  • 18,864
  • 3
  • 43
  • 78
3
votes
1 answer

Paradoxical VTune Amplifier microarchitecture exploration results

I am trying to optimize a sin/cos approximation function. At its core there is a simple Horner scheme consisting of a bunch of multiplies and adds. Compiler is MSVC from VS2017, processor is Intel Xeon E5-1650, hyperthreading is on (but observations…
Max Langhof
  • 22,398
  • 5
  • 38
  • 68
3
votes
0 answers

What causes the DTLB_LOAD_MISSES.WALK_* performance events to occur?

Consider the following loop: .loop: add rsi, STRIDE mov eax, dword [rsi] dec ebp jg .loop where STRIDE is some non-negative integer and rsi contains a pointer to a buffer defined in the bss section. This loop is the…
Hadi Brais
  • 18,864
  • 3
  • 43
  • 78
3
votes
2 answers

How to read PMC (Performance Monitoring Counter) of Intel processor?

I'm trying to read PMC (Performance Monitoring Counter) by using RDMSR and WRMSR instructions. In my Linux desktop which has Intel i7 6700 CPU (Skylake), I wrote a simple driver code: static int my_init(void) { unsigned int msr; u64 low,…
nickeys
  • 117
  • 2
  • 9