3

i am using PAPI liberary to tune and profile my application.

I want to know what (PAPI_REF_CYC : Reference clock cycles ) means actually?

Thanks in advance,

abdul
  • 45
  • 5

1 Answers1

7

Some modern CPUs, including the Intel's and AMD's ones, are throttled.
This means that their clocks are not fixed but vary depending on the power management active - even if the CPU's brand frequency is X Ghz, more often than not, it is not running at that frequency.

For a couple of real example technology see the Intel Turbo boost technology/AMD Turbo core and Intel Enhanced Speedstep technology/AMD Quiet'n'Cool technology.

Since the core clock can slow down or speed up, comparing two different measures makes no sense.
Having a snippet A to run in 100 core clocks and a snippet B in 200 core clocks means that B is slower in general (it takes double the work), but not necessarily that B took more time than A since the units are different. That's where the reference clock comes into play - it is uniform.
If snippet A runs in 100 ref clocks and snippet B runs in 200 ref clocks then B really took more time than A.

Converting ref clock ticks into time (e.g. seconds) is not that easy, each processor uses a difference reference frequency, even among processor with the same brand name.

Margaret Bloom
  • 33,863
  • 5
  • 53
  • 91
  • 1
    The definition is correct, but perhaps the comparison example could be the opposite: i.e, that you _should_ compare real cycles, not ref cycles (which is really just unhalted wall-clock time). If a snipped of code runs in 100 real cycles, and 100 ref cycles, and another snipped runs in 200 real cycles and 100 ref cycles (because the average CPU frequency was double the first case), then do they perform the same, or is does the second one take double the "time" (conveniently canceled out by running at 2x the frequency)? Correct interpretation depends on whether it's CPU or L3/memory limited. – BeeOnRope Apr 14 '17 at 20:29
  • In particular, for many benchmarks where you can't or don't want to totally control the frequency it really convenient to report real cycles, which often correlates much better and more stably to actual performance in the presence of frequency scaling. – BeeOnRope Apr 14 '17 at 20:30
  • @BeeOnRope Good point! I totally turned the clocks the other way around! – Margaret Bloom Apr 15 '17 at 01:27
  • But in turn, if part of a program is slow because it is limited by something else than cpu performance, then it would be more useful to count __ref-cycles__ to notice that the code is slow inside that particular function. I think this is just sampling wall clock time to identify slow parts of the code. Right? – Peter Oct 29 '19 at 15:39
  • @Peter Yes, ref-cycles is better, for example, to profile an application bounded by the network. Using a wall clock will give you an uniform time. – Margaret Bloom Oct 29 '19 at 16:51