7

When I run perf list I see a bunch of Hardware Cache Events, as follows:

$ perf list | grep 'cache event'
  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  LLC-load-misses                                    [Hardware cache event]
  LLC-loads                                          [Hardware cache event]
  LLC-store-misses                                   [Hardware cache event]
  LLC-stores                                         [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  dTLB-stores                                        [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  iTLB-loads                                         [Hardware cache event]
  node-load-misses                                   [Hardware cache event]
  node-loads                                         [Hardware cache event]
  node-store-misses                                  [Hardware cache event]
  node-stores                                        [Hardware cache event]

These events mostly seem to return reasonable values based on tests, but I would like to know how to determine to map these events to hardware performance counter events on my system?

That is, these events are certainly implemented using one or more underlying x86 PMU counters on my Skylake CPU - but how do I know which ones?

You can look in /sys/devices/cpu/events for other hardware events, but not for "Hardware cache events".

BeeOnRope
  • 51,419
  • 13
  • 149
  • 309
  • 2
    Does [this](https://github.com/torvalds/linux/blob/7796916146b8c34cbbef66470ab8b5b28cf47e83/arch/x86/events/intel/core.c#L384) help? – Margaret Bloom Sep 04 '18 at 18:19
  • @MargaretBloom, yes for someone with enough motivation for reading source :). I was trying not to make it an "answer your own question" type question but I guess it might be... – BeeOnRope Sep 04 '18 at 19:11
  • If you are patient I'll try to put an answer together as soon as I have some free time :) – Margaret Bloom Sep 04 '18 at 19:59
  • @MargaretBloom Today I guess I am not, it seems the pointer to the right file was enough to get me started, as I had already written an answer by the time you made your offer! Of course, better answers may be possible. I suppose you might have some [insight on this related question](https://stackoverflow.com/q/52173478/149138). – BeeOnRope Sep 04 '18 at 20:22

1 Answers1

5

User @Margaret points towards a reasonable answer in the comments - read the kernel source to see the mapping for the PMU events.

We can check arch/x86/events/intel/core.c for the event definitions. I don't actually know if "core" here refers to the Core architecture, of just that this is the core fine with most definitions - but in any case it's the file you want to look at.

The key part is this section, which defines skl_hw_cache_event_ids:

static __initconst const u64 skl_hw_cache_event_ids
                [PERF_COUNT_HW_CACHE_MAX]
                [PERF_COUNT_HW_CACHE_OP_MAX]
                [PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
 [ C(L1D ) ] = {
    [ C(OP_READ) ] = {
        [ C(RESULT_ACCESS) ] = 0x81d0,  /* MEM_INST_RETIRED.ALL_LOADS */
        [ C(RESULT_MISS)   ] = 0x151,   /* L1D.REPLACEMENT */
    },
    [ C(OP_WRITE) ] = {
        [ C(RESULT_ACCESS) ] = 0x82d0,  /* MEM_INST_RETIRED.ALL_STORES */
        [ C(RESULT_MISS)   ] = 0x0,
    },
    [ C(OP_PREFETCH) ] = {
        [ C(RESULT_ACCESS) ] = 0x0,
        [ C(RESULT_MISS)   ] = 0x0,
    },
},
...

Decoding the nested initializers, you get that the L1D-dcahe-load corresponds to MEM_INST_RETIRED.ALL_LOAD and L1-dcache-load-misses to L1D.REPLACEMENT.

We can double check this with perf:

$ ocperf stat -e mem_inst_retired.all_loads,L1-dcache-loads,l1d.replacement,L1-dcache-load-misses,L1-dcache-loads,mem_load_retired.l1_hit head -c100M /dev/zero > /dev/null

 Performance counter stats for 'head -c100M /dev/zero':

        11,587,793      mem_inst_retired_all_loads                                   
        11,587,793      L1-dcache-loads                                             
            20,233      l1d_replacement                                             
            20,233      L1-dcache-load-misses     #    0.17% of all L1-dcache hits  
        11,587,793      L1-dcache-loads                                             
        11,495,053      mem_load_retired_l1_hit                                     

       0.024322360 seconds time elapsed

The "Hardware Cache" events show exactly the same values as using the underlying PMU events we guessed at by checking the source.

BeeOnRope
  • 51,419
  • 13
  • 149
  • 309
  • Great answer thank you! What's going on though with the events that have `0x0` value such as `L1-dcache-write-misses`. Also what about the `node-read-misses`, `node-read-accesses`, `node-write-misses`, `node-write-accesses` which have all the same value ? – Orion Papadakis May 08 '20 at 11:17
  • 1
    @OrionPapadakis - good question, I am not sure. Maybe worth another question. Since this question was posted the way perf exposes some events has gotten better too. Another way to look stuff up is `event-rmap` from [pmu-tools](https://github.com/andikleen/pmu-tools). – BeeOnRope May 08 '20 at 21:12