15

I have seen this term in several posts about profiling applications but I don't understand what it actually means and how it affects profiling results.

I have seen it here for dtrace:

The rate is also increased to 199 Hertz, as capturing kernel stacks is much less expensive than user-level stacks. The odd numbered rates, 99 and 199, are used to avoid sampling in lockstep with other activity and producing misleading results.

Here for perf:

-F 99: sample at 99 Hertz (samples per second). I'll sometimes sample faster than this (up to 999 Hertz), but that also costs overhead. 99 Hertz should be negligible. Also, the value '99' and not '100' is to avoid lockstep sampling, which can produce skewed results.

From what I have seen all profilers should avoid lockstep sampling because results can be "skewed" and "misleading" but I don't understand why. I guess that this question is applicable to all profilers but I am interested in perf on linux.

ks1322
  • 29,461
  • 12
  • 91
  • 140

1 Answers1

12

Lockstep sampling is when the profiling samples occur at the same frequency as a loop in the application. The result of this would be that the sample often occurs at the same place in the loop, so it will think that that operation is the most common operation, and a likely bottleneck.

An analogy would be if you were trying to determine whether a road experiences congestion, and you sample it every 24 hours. That sample is likely to be in lock-step with traffic variation; if it's at 8am or 5pm, it will coincide with rush hour and conclude that the road is extremely busy; if it's at 3am it will conclude that there's practically no traffic at all.

For sampling to be accurate, it needs to avoid this. Ideally, the samples should be much more frequent than any cycles in the application, or at random intervals, so that the chance it occurs in any particular operation is proportional to the amount of time that operation takes. But this is often not feasible, so the next best thing is to use a sampling rate that doesn't coincide with the likely frequency of program cycles. If there are enough cycles in the program, this should ensure that the samples take place at many different offsets from the beginning of each cycle.

To relate this to the above analogy, sampling every 23 hours or at random times each day will cause the samples to eventually encounter all times of the day; every 23-day cycle of samples will include all hours of the day. This produces a more complete picture of the traffic levels. And sampling every hour would provide a complete picture in just a few weeks.

I'm not sure why odd-numbered frequencies are likely to ensure this. It seems to be based on an assumption that there are natural frequencies for program operations, and these are even.

Barmar
  • 596,455
  • 48
  • 393
  • 495
  • Samples do not need to be frequent. If some "bottleneck" takes 30% of time, then even if only 20 random samples are taken, the chance that it will be exposed on two or more of those samples is 99.2% [*Check here.*](https://scicomp.stackexchange.com/a/2719/1262) – Mike Dunlavey Aug 03 '17 at 13:31
  • @MikeDunlavey That's true for a huge bottleneck, but that's a pretty unusual case. You need more frequent samples to detect things that use 1-10%. Also, notice that the samples are not random, they're at a consistent frequency. The whole point of this is to avoid it being at the same frequency as the code being profiled (you might consistently miss that 30% section). – Barmar Aug 03 '17 at 18:14
  • Thanks, the main question is with what sampling can coincide? On linux I suspect that it can be linux kernel timer which runs at 1000Hz by default, but I am not sure for this. – ks1322 Aug 04 '17 at 08:57
  • @Barmar: Well, not that length of experience matters, but when people tell me they're looking for things in the range of 10% or less, they're really saying they think the code is pretty darn near optimal, and they're positive there's nothing bigger lurking (because if there were it would make them look bad). What I've seen that's more typical is yes, there could be visible problem A that's 10%, but also hidden problem B that's 50%. If they could find and fix B, that makes A 20%, so it's easier to find. Long and short: programmers kid themselves. – Mike Dunlavey Aug 07 '17 at 19:14
  • 1
    @MikeDunlavey In any case, if the sampling is in lock step, it could completely distort the results. If the application spends 90% of time in one block, and 10% in another, and the sample is synchronized with the loop, it might always sample in the small block. You need either random or high frequency sampling to ensure you avoid this. – Barmar Aug 07 '17 at 19:41