Questions tagged [intel-pmu]

Questions related to the use of the Intel Performance Management Unit, which provides performance counters related to the performance of currently executing code.

The Intel performance management unit provides performance counters which track performance related metrics for the currently executing code.

They are useful while profiling code, and are supported by Intel's VTune, Linux's perf command and the Windows Performance Toolkit.

The counters and the details of how to program them vary by CPU architecture and the details are available in Chapter 18 and 19 of the Intel-64 and IA-32 Architectures Software Developer Manual, Volume 3.

65 questions

votes

2 answers

Haswell memory access

I was experimenting with AVX -AVX2 instruction sets to see the performance of streaming on consecutive arrays. So I have below example, where I do basic memory read and store. #include #include #include #include…

asked Oct 27 '13 at 18:08

edorado

votes

1 answer

What restriction is perf_event_paranoid == 1 actually putting on x86 perf?

Newer Linux kernels have a sysfs tunable /proc/sys/kernel/perf_event_paranoid which allows the user to adjust the available functionality of perf_events for non-root users, with higher numbers being more secure (offering correspondingly less…

linux-kernel x86 profiling perf intel-pmu

asked Aug 18 '18 at 18:08

BeeOnRope

51,419
13
149
309

votes

0 answers

On Skylake (SKL) why are there L2 writebacks in a read-only workload that exceeds the L3 size?

Consider the following simple code: #include #include #include #include #include int cpu_ms() { return (int)(clock() * 1000 / CLOCKS_PER_SEC); } int main(int argc, char** argv) { if (argc <…

performance x86 cpu-cache perf intel-pmu

asked Sep 29 '18 at 05:09

BeeOnRope

51,419
13
149
309

votes

5 answers

Can the Intel performance monitor counters be used to measure memory bandwidth?

Can the Intel PMU be used to measure per-core read/write memory bandwidth usage? Here "memory" means to DRAM (i.e., not hitting in any cache level).

performance x86 intel-pmu memory-bandwidth

asked Dec 02 '17 at 21:37

BeeOnRope

51,419
13
149
309

votes

2 answers

Reliability of Xcode Instrument's disassembly time profiling

I've profiled my code using Instrument's time profiler, and zooming in to the disassembly, here's a snippet of its results: I wouldn't expect a mov instruction to take 23.3% of the time while a div instruction to take virtually nothing. This causes…

xcode x86 profiling instruments intel-pmu

asked Jan 21 '18 at 16:58

yairchu

21,122
7
65
104

votes

1 answer

Can the LSD issue uOPs from the next iteration of the detected loop?

I was playing investigating the capabilities of the branch unit on port 0 of my Haswell starting with a very simple loop: BITS 64 GLOBAL _start SECTION .text _start: mov ecx, 10000000 .loop: dec ecx ;| jz .end ;| 1…

assembly x86 cpu-architecture intel-pmu

asked Aug 28 '18 at 09:32

Margaret Bloom

33,863
5
53
91

votes

2 answers

Why does the number of uops per iteration increase with the stride of streaming loads?

Consider the following loop: .loop: add rsi, OFFSET mov eax, dword [rsi] dec ebp jg .loop where OFFSET is some non-negative integer and rsi contains a pointer to a buffer defined in the bss section. This loop is the…

assembly x86 cpu-architecture intel-pmu

asked Sep 26 '18 at 23:25

Hadi Brais

18,864
3
43
78

votes

2 answers

rdpmc: surprising behavior

I'm trying to understand the rdpmc instruction. As such I have the following asm code: segment .text global _start _start: xor eax, eax mov ebx, 10 .loop: dec ebx jnz .loop mov ecx, 1<<30 ; calling rdpmc with ecx = (1<<30)…

performance assembly x86 performancecounter intel-pmu

asked May 17 '19 at 19:43

user14717

3,854
2
26
61

votes

0 answers

Why does Linux perf use event l1d.replacement for "L1 dcache misses" on x86?

On Intel x86, Linux uses the event l1d.replacements to implement its L1-dcache-load-misses event. This event is defined as follows: Counts L1D data line replacements including opportunistic replacements, and replacements that require…

linux x86 profiling perf intel-pmu

asked Sep 04 '18 at 20:20

BeeOnRope

51,419
13
149
309

votes

1 answer

Hardware cache events and perf

When I run perf list I see a bunch of Hardware Cache Events, as follows: $ perf list | grep 'cache event' L1-dcache-load-misses [Hardware cache event] L1-dcache-loads [Hardware…

linux performance x86 perf intel-pmu

asked Sep 04 '18 at 16:58

BeeOnRope

51,419
13
149
309

votes

1 answer

Why are the user-mode L1 store miss events only counted when there is a store initialization loop?

Summary Consider the following loop: loop: movl $0x1,(%rax) add $0x40,%rax cmp %rdx,%rax jne loop where rax is initialized to the address of a buffer that is larger than the L3 cache size. Every iteration performs a store operation to…

x86 intel performancecounter cpu-cache intel-pmu

asked Mar 05 '19 at 02:59

Hadi Brais

18,864
3
43
78

votes

2 answers

Can we measure successful store-forwarding with Intel's performance counters?

Is it possible to measure the number of successful store-forwarding operations using the performance counters on recent Intel x86 chips? I see events for ld_blocks.store_forward which measure failed store-forwarding, but it's clear to me if the…

performance x86 intel-pmu

asked Sep 09 '17 at 22:54

BeeOnRope

51,419
13
149
309

votes

1 answer

PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE concurrent monitoring

I'm working on a custom implementation on top of perf_event_open syscall. The implementation aims to support various of PERF_TYPE_HARDWARE, PERF_TYPE_SOFTWARE and PERF_TYPE_HW_CACHE events for specific threads on any core. In Intel® 64 and IA-32…

perf multiplexing intel-pmu

asked May 18 '20 at 21:23

Orion Papadakis

votes

2 answers

Is it possible for the RESOURCE_STALLS.RS event to occur even when the RS is not completely full?

The description of the RESOURCE_STALLS.RS hardware performance event for Intel Broadwell is the following: This event counts stall cycles caused by absence of eligible entries in the reservation station (RS). This may result from RS overflow, or …

performance x86 intel cpu-architecture intel-pmu

asked Oct 05 '18 at 00:15

Hadi Brais

18,864
3
43
78

votes

4 answers

Hardware Performance counter on Intel Core Duo

I have read that there are AMD processors out there that allow you to measure the number of cache hits and misses. I am wondering if also such a feature is available on Intel Core Duo machines or if they do not support this yet.

performance x86 intel processor intel-pmu

asked Nov 09 '10 at 13:44

Alex12

2 3 4 5 Next