5

I want to measure cache miss rate of my code. We can use perf list to show supported the events. My desktop has a Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz processor, the perf list contains cache-refrences, and cache-misses, like this:

  cpu-cycles OR cycles                               [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  instructions                                       [Hardware event]
  cache-references                                   [Hardware event]
  cache-misses                                       [Hardware event]

I think cache-misses is mapped to hardware event LLC-misses according to the Intel architectures software developer's manual (I confirm this by comparing perf stat -e r412e and perf stat -e cache-misses, they given almost identical result). But how is cache-references counted? I didn't find a event or way to get total cache references using existing hardware events. So I'm wondering if this cache-references is accurate on my computer?

Robert
  • 305
  • 1
  • 4
  • 7
  • I see cache-references just below cache-misses. What is hour exact problem? – Milind Dumbare Jun 09 '14 at 20:08
  • I mean although you are given cache-references, I was wondering how is it counted , is it accurate? I didn't find an event counter in Intel manual for cache references. – Robert Jun 10 '14 at 05:50

3 Answers3

5

If you look at arch/x86/kernel/cpu/perf_event_intel.c in kernel code. You will see that

"PERF_COUNT_HW_CACHE_REFERENCES = 0x4f2e". 

Where as

"PERF_COUNT_HW_CACHE_MISSES= 0x412e"

X86 architectual manual says 0x4f2e is "This event counts requests originating from the core that reference a cache line in the last level cache". So I assume it to be correct.

Milind Dumbare
  • 2,768
  • 2
  • 16
  • 32
  • Yes. These two events are architectural performance events in Intel processors. 0x4f2e is LLC reference which is described as *longest latency cache references* and 0x412e is LLC misses which is described as *longest cache misses*. I get this from this manual [Intel manual](http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html). However I think LLC references is just memory accesses that hit LLC while does not include cache hits in L1 and L2. So it is not accurate. – Robert Jun 11 '14 at 07:33
  • So 0x4f2e is just cache references that hit LLC but does not include cache references that hit L1 and L2. – Robert Jun 11 '14 at 07:44
4

On Intel, I don't think perf is providing an event to measure total cache references because such event doesn't exist at hardware level. You should be able to compute this information yourself using hardware cache event reported by perf list:

L1-dcache-loads                                    [Hardware cache event]
L1-dcache-load-misses                              [Hardware cache event]
L1-dcache-stores                                   [Hardware cache event]
L1-dcache-store-misses                             [Hardware cache event]
L1-dcache-prefetches                               [Hardware cache event]
L1-dcache-prefetch-misses                          [Hardware cache event]
L1-icache-loads                                    [Hardware cache event]
L1-icache-load-misses                              [Hardware cache event]
L1-icache-prefetches                               [Hardware cache event]
L1-icache-prefetch-misses                          [Hardware cache event]
LLC-loads                                          [Hardware cache event]
LLC-load-misses                                    [Hardware cache event]
LLC-stores                                         [Hardware cache event]
LLC-store-misses                                   [Hardware cache event]
LLC-prefetches                                     [Hardware cache event]
LLC-prefetch-misses                                [Hardware cache event]

Events not tagged with -misses represent the number of references in the associated cache.

Note: this previous question and this man page about perf_event_open (used internally by perf) may help.

Community
  • 1
  • 1
Manuel Selva
  • 16,987
  • 21
  • 76
  • 127
1

I tried a tool called Vtune from Intel, I got some clues about how to measure the total cache references. They can measure the micro operation codes, and filter those instructions that are load or store so to get total cache references. But I´m not sure if perf tool also use this method.

Robert
  • 305
  • 1
  • 4
  • 7
  • ocperf.py from pmu-tools https://github.com/andikleen/pmu-tools is able to use any event on Intel by symbolic name (the script just programs the `perf` tool); several useful scripts are included – osgx Mar 15 '15 at 22:49