9

I have a test case to observe perf iTLB-loads,iTLB-load-misses by

perf stat -e dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses -p 22479

and get the output :

Performance counter stats for process id '22479':

     1,262,817      dTLB-loads                                                  
        13,950      dTLB-load-misses          #    1.10% of all dTLB cache hits 
            75      iTLB-loads                                                  
         6,882      iTLB-load-misses          # 9176.00% of all iTLB cache hits 

   3.999720948 seconds time elapsed

I have no idea how to interpret iTLB-loads only 75 but iTLB-load-misses 6,882 ?!

lscpu showes : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

Edit :

May I interpret it as the following :

do (75+6882) times of iTLB-loads , there are 75 times hits but 6882 times misses ?

Edit :

ocperf.py list | wc -l
Downloading https://download.01.org/perfmon/mapfile.csv to mapfile.csv

Traceback (most recent call last):
File "/home/marschen/tools/pmu-tools-master/ocperf.py", line 1012, in <module>
emap = find_emap()
File "/home/marschen/tools/pmu-tools-master/ocperf.py", line 831, in find_emap
event_download.download(el, toget)
File "/home/marschen/tools/pmu-tools-master/event_download.py", line 105, in download
getfile(modelpath, dir, "mapfile.csv")
File "/home/marschen/tools/pmu-tools-master/event_download.py", line 86, in getfile
f = urlopen(url)
File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/usr/lib64/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib64/python2.7/urllib2.py", line 1258, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/lib64/python2.7/urllib2.py", line 1211, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib64/python2.7/httplib.py", line 1017, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python2.7/httplib.py", line 1051, in _send_request
self.endheaders(body)
File "/usr/lib64/python2.7/httplib.py", line 1013, in endheaders
self._send_output(message_body)
File "/usr/lib64/python2.7/httplib.py", line 864, in _send_output
self.send(msg)
File "/usr/lib64/python2.7/httplib.py", line 826, in send
self.connect()
File "/usr/lib64/python2.7/httplib.py", line 1227, in connect
HTTPConnection.connect(self)
File "/usr/lib64/python2.7/httplib.py", line 807, in connect
self.timeout, self.source_address)
File "/usr/lib64/python2.7/socket.py", line 562, in create_connection
sock.connect(sa)
File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
barfatchen
  • 1,426
  • 18
  • 41
  • 1
    That is odd. I tried on Skylake and could repro the behaviour of iTLB misses > iTLB accesses. I'm not sure what actual counter `iTLB-loads` is mapped to. Skylake doesn't seem to have a counter for iTLB accesses, only for misses (`frontend_retired.itlb_miss` in `ocperf.py`). The uop cache is virtually addressed, so fetching uops from the uop cache (DSB) doesn't require TLB accesses if it hits. – Peter Cordes Apr 20 '18 at 04:47
  • @Peter , I google several webpages for more information , but still failed to get the coorect way to interpret what I observed for this data . – barfatchen Apr 20 '18 at 05:06
  • We need to figure out what hardware event `perf` is actually using for `iTLB-loads`, and find out what it means. I tried using `perf --debug verbose=2`, but I'm not sure if those numbers are the same event / mask numbers that you can find documentation like http://oprofile.sourceforge.net/docs/intel-haswell-events.php, or like you can see with `ocperf.py stat -e frontend_retired.itlb_miss` – Peter Cordes Apr 20 '18 at 05:23
  • @Peter , thanks , ocperf.py pmu-tools won't work for me , I am not familiar with python , some error messages happened in ocperf.py . – barfatchen Apr 20 '18 at 07:20
  • IDK, looks like you installed it wrong, or maybe it's missing a dependency, despite https://github.com/andikleen/pmu-tools saying it only needs that standard stuff. Your error output doesn't include the actual exception, just the traceback. I haven't updated mine from git for a while. – Peter Cordes Apr 20 '18 at 07:29
  • @Peter , I just download pmu-tools from https://github.com/andikleen/pmu-tools just now , it won't work in my server . pity . – barfatchen Apr 20 '18 at 07:33
  • How old is your server's software? What are you running on it? Maybe you can manually download the required file, or do it on another computer, and put it in the right place. (If it's only the downloading libraries that are a problem; it might be that you'd run into more problems in other functions later.) – Peter Cordes Apr 20 '18 at 07:35
  • I just download pmu-tools-master.zip and unzip it , then run ocperf.py with no luck . – barfatchen Apr 20 '18 at 07:42
  • there is a Makefile in pmu-tools-master directory , I did not make it though . – barfatchen Apr 20 '18 at 07:44
  • No, how old is the Linux distro you're using? I forget if `make` helps. I don't think I had to let it install anything in /usr/local; I just run it from a symlink into the source directory (where I have a `git clone` of the repo) – Peter Cordes Apr 20 '18 at 07:45
  • Linux testhost 3.10.0-693.el7.x86_64 , is it what you means ?! – barfatchen Apr 20 '18 at 07:46
  • Ok, that's your kernel version, so you're on RHEL7 I think. I'm not sure how old their Python is. I think the git repo said only RHEL5 was too old for `perf`, but IDK how up to data that readme is. Current Linux is 4.15 or so, but RHEL does have kernel patches... – Peter Cordes Apr 20 '18 at 07:48
  • sounds like just what you said , my production server is the same kernel version , so maybe I will try other source , thanks for your kind help. – barfatchen Apr 20 '18 at 07:50
  • @PeterCordes I think `perf list pmu` or `perf list --long-desc pmu` should print all aliases and the events they are mapped to. – Hadi Brais Apr 20 '18 at 20:02
  • @HadiBrais: It doesn't, `perf list pmu` looks like what you get from `ocperf.py list`, but *without* the simple/generic event names like `LLC-loads`. `perf list --details iTLB-loads` sounds like it's supposed to be useful from the docs, but it isn't. It just says "`[Hardware cache event]`" and prints some generic stuff. – Peter Cordes Apr 21 '18 at 00:59
  • 1
    @PeterCordes According to the source code of perf, the alias names are obtained from the names of the files in `/sys/bus/event_source/devices/cpu/events`. The name of the file is itself the alias and each file contains the event code of the actual performance event. The alias names of other performance events for devices other than the CPU can be found in `/sys/bus/event_source/devices//events`. – Hadi Brais Apr 21 '18 at 01:47
  • @HadiBrais: Cool, thanks for digging that up. Unfortunately that doesn't include any TLB events on Linux 4.15 on Skylake. `find -L /sys/bus/event_source/ -iname '*tlb*'` doesn't find any tlb events anywhere. `.../cpu/events` has what perf calls "Hardware event", but not any of the "Hardware cache event" names. – Peter Cordes Apr 21 '18 at 01:56
  • 1
    @PeterCordes After a lot more digging, on Skylake, `iTLB-loads` is mapped to `ITLB_MISSES.STLB_HIT` and `iTLB-load-misses` is mapped to `ITLB_MISSES.WALK_COMPLETED`. The numbers make sense now. – Hadi Brais Apr 21 '18 at 16:53
  • 1
    On Broadwell (the OP's processor), `iTLB-loads` is mapped to `ITLB_MISSES.STLB_HIT` and `iTLB-load-misses` is mapped to `ITLB_MISSES.MISS_CAUSES_A_WALK`. – Hadi Brais Apr 21 '18 at 17:05

1 Answers1

9

On your Broadwell processor, perf maps iTLB-loads to ITLB_MISSES.STLB_HIT, which represents the event of a TLB lookup that misses the L1 ITLB but hits the unified TLB for all page sizes, and iTLB-load-misses to ITLB_MISSES.MISS_CAUSES_A_WALK, which represents the event of a TLB lookup that misses both the L1 ITLB and the unified TLB (causing a page walk) for all page sizes. Therefore, iTLB-load-misses can be larger or smaller than or equal to iTLB-loads. They are independent events.

Hadi Brais
  • 18,864
  • 3
  • 43
  • 78
  • 3
    Seems like a very odd design choice. Would have made more sense for `perf` to just say that the `iTLB-loads` event isn't available on those CPUs, instead of confusingly using it for hits in the 2nd-level unified TLB after an iTLB miss. – Peter Cordes Apr 21 '18 at 19:18
  • To be clear, this is a bug in `perf` (at the very least in the way they compare the two numbers), this isn't a "design choice" – jberryman Feb 11 '19 at 22:35
  • @jberryman Is it a confirmed bug? BTW, this is not the only inconsistency in the `perf` events. See for example my answer to [this](https://stackoverflow.com/questions/44466697/perf-stat-does-not-count-memory-loads-but-counts-memory-stores) other question. These inconsistencies seem to be by design. – Hadi Brais Feb 12 '19 at 04:21
  • 5
    @HadiBrais well `ITLB_MISSES.MISS_CAUSES_A_WALK / ITLB_MISSES.STLB_HIT` is a pretty meaningless number right? It seems clear that `iTLB-loads` is mapped to the wrong underlying event. I don't think there's a plan to fix it or a ticket afaik. I think `perf` is a shitshow, but I appreciate people like you who dig into the details and document these weird quirks – jberryman Feb 12 '19 at 15:55