1

Following answer about Benchmarking - How to count number of instructions sent to CPU to find consumed MIPS suggest that:

perf stat ./my_program on Linux will use CPU performance counters to record how many instructions it ran, and how many core clock cycles it took. (And how much CPU time it used, and will calculate MIPS for you).


An example generates following output which does not contain calculated MIPS information.

 Performance counter stats for './hello.py':

       1452.607792 task-clock (msec)         #    0.997 CPUs utilized
               327 context-switches          #    0.225 K/sec
               147 cpu-migrations            #    0.101 K/sec
            35,548 page-faults               #    0.024 M/sec
     2,254,593,107 cycles                    #    1.552 GHz                     [26.64%]
   <not supported> stalled-cycles-frontend
   <not supported> stalled-cycles-backend
     1,652,281,933 instructions              #    0.73  insns per cycle         [38.87%]
       353,431,039 branches                  #  243.308 M/sec                   [37.95%]
        18,536,723 branch-misses             #    5.24% of all branches         [38.06%]
       612,338,241 L1-dcache-loads           #  421.544 M/sec                   [25.93%]
        41,746,028 L1-dcache-load-misses     #    6.82% of all L1-dcache hits   [25.71%]
        25,531,328 LLC-loads                 #   17.576 M/sec                   [26.39%]
         1,846,241 LLC-load-misses           #    7.23% of all LL-cache hits    [26.26%]

       1.456531157 seconds time elapsed

[Q] How could I calculate MIPS correctly from output of perf stat? In order to calculate MIPS should I do following instructions/seconds_time_elapsed from the values obtained from perf stat?

alper
  • 1,558
  • 2
  • 24
  • 54

1 Answers1

2

It's obviously just instructions / seconds. (divided by 1 million to scale for the Mega metric prefix.)

Using the total elapsed time will give you MIPS for the whole program, total across all cores, and counting any time spent sleeping / waiting against it.

Task-clock will count total CPU time used on all cores, so it will give you the average MIPS across all cores used, not counting any time spent sleeping. (task-clock:u would count only user-space time, but task-clock counts time spent in the kernel as well.)

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
  • As I understand `task-clock` is equivalent for average MIPS across all cores, right? So as I understand, it will be more efficient to use `task-clock` instead of `instructions / seconds_elapsed_seconds` in order to ignore any time spent sleeping / waiting. @Peter Cordes – alper Nov 18 '18 at 17:07
  • 1
    @alper: it's not a question of "efficiency", it's just 2 different things you might want to measure. `instructions / task-clock` is average MIPS across all cores, counting only time that wasn't spent sleeping. e.g. `perf stat sleep 1` will only show a `task-clock` of ~0.4 milliseconds, but an elapsed time of 1 second. Code that has I/O waits or hard page faults is losing time not executing any instructions during those intervals, so it's certainly reasonable to count that time against it. Or time spent sleeping waiting for mutexes in multi-threaded code. – Peter Cordes Nov 18 '18 at 17:12
  • Both have a large difference between each other. `perf stat sleep 1` returns `1 msec` for the task-clock, which is 0.001 seconds. So it will be for elapsed time `instructions / 1` or task-clock `instructions / 0.001` which is around 1000 times smaller. @Peter Cordes – alper Nov 18 '18 at 17:18
  • 1
    @alper: exactly. `sleep` uses very little CPU time to start up, dynamic-link its libraries, and make a system call. But it spends significant time sleeping so the total wall-clock time is high. During most of the elapsed time, the CPU is available for other tasks, but the `sleep` task itself is not complete. `cp` of a large file would be similar: most of the time spent waiting for I/O. – Peter Cordes Nov 18 '18 at 17:23
  • 1
    The `u` and `k` modifiers (and any other modifier) don't seem to work on the `task-clock` and `cpu-clock` events. Both user and kernel times are always included. – Hadi Brais Sep 15 '19 at 21:22