8

The output of a typical profiler is, a list of functions in your code, sorted by the amount of time each function took while the program ran.

This is very good, but sometimes I'm interested more with what was program doing most of the time, than with where was EIP most of the time.

An example output of my hypothetical profiler is:

Waiting for file IO - 19% of execution time.
Waiting for network -  4% of execution time
Cache misses        - 70% of execution time.
Actual computation  -  7% of execution time.

Is there such a profiler? Is it possible to derive such an output from a "standard" profiler?

I'm using Linux, but I'll be glad to hear any solutions for other systems.

Elazar Leibovich
  • 30,136
  • 27
  • 116
  • 161

2 Answers2

1

This is Solaris only, but dtrace can monitor almost every kind of I/O, on/off CPU, time in specific functions, sleep time, etc. I'm not sure if it can determine cache misses though, assuming you mean CPU cache - I'm not sure if that information is made available by the CPU or not.

Mark B
  • 91,641
  • 10
  • 102
  • 179
1

Please take a look at this and this.

Consider any thread. At any instant of time it is doing something, and it is doing it for a reason, and slowness can be defined as the time it spends for poor reasons - it doesn't need to be spending that time.

Take a snapshot of the thread at a point in time. Maybe it's in a cache miss, in an instruction, in a statement, in a function, called from a call instruction in another function, called from another, and so on, up to call _main. Every one of those steps has a reason, that an examination of the code reveals.

  1. If any one of those steps is not a very good reason and could be avoided, that instant of time does not need to be spent.

Maybe at that time the disk is coming around to certain sector, so some data streaming can be started, so a buffer can be filled, so a read statement can be satisfied, in a function, and that function is called from a call site in another function, and that from another, and so on, up to call _main, or whatever happens to be the top of the thread.

  1. Repeat previous point 1.

So, the way to find bottlenecks is to find when the code is spending time for poor reasons, and the best way to find that is to take snapshots of its state. The EIP, or any other tiny piece of the state, is not going to do it, because it won't tell you why.

Very few profilers "get it". The ones that do are the wall-clock-time stack-samplers that report by line of code (not by function) percent of time active (not amount of time, especially not "self" or "exclusive" time.) One that does is Zoom, and there are others.

Looking at where the EIP hangs out is like trying to tell time on a clock with only a second hand. Measuring functions is like trying to tell time on a clock with some of the digits missing. Profiling only during CPU time, not during blocked time, is like trying to tell time on a clock that randomly stops running for long stretches. Being concerned about measurement precision is like trying to time your lunch hour to the second.

This is not a mysterious subject.

Community
  • 1
  • 1
Mike Dunlavey
  • 38,662
  • 12
  • 86
  • 126
  • You presented the classical statistical method of sampling, and make note that you can get more accurate results by sampling uniformly at random, which is better than sampling only when a function is called, as `gprof` is doing. As a side note I'll remark that I'm not sure that in this case sampling uniformly gives such an advantage, since you're oversampling anyhow, sampling at every function call is more than enough. Of course, you'll get worse results - but your profiler will be simpler and maybe that's worth it. But what I don't understand is, how is that answering my question? – Elazar Leibovich Feb 09 '11 at 09:46
  • @Elazar: I believe a valid answer to a question is to question the premise (legalistically :-). The premise was that in order to know what a program is doing most of the time, certain specific outputs are helpful, and that's what I question. As far as the method, there is no classical method. Sampling when methods are called is not sampling. (It's called instrumentation.) Sampling has to be uncorrelated with program state, because if it's correlated then it can easily miss large problems (such as missing all IO, as you see with gprof and the VS profiler). Sample the stack, at random. – Mike Dunlavey Feb 09 '11 at 13:31
  • Mike, I meant sampling as in statistics. You measure the program state at some point. Instrumentation is just a way to sample your program in a non-uniform fashion, this http://goo.gl/3mSJ0 is another nonstandard way to sample it. What I'm saying is, that although instrumentation is not sampling uniformly, it samples your program frequently enough so that effectively you'll have a sample every, say, 2 ms, and thus it can also be useful enough. You'll know which chunks of code are the bottleneck at your effective sample rate. Uniform sampling is better, but non-uniform sampling is not useless. – Elazar Leibovich Feb 09 '11 at 13:52
  • as for your question about the use case for my question. Sometimes you want to make a program run faster, but you don't have its source, or you don't want to change it. Using output as I described, can help you make it faster with less developer resources (70% of the time this program wait for disk IO? Let me move relevant file to an in-memory disk or SSD. Too much cache misses? Let me add memory, or make sure it runs alone etc...). Look at Mark's blog http://goo.gl/Zn569 , he tried to profile a program in a similar way to my idea in order to find bottlenecks in closed source programs. – Elazar Leibovich Feb 09 '11 at 13:59
  • @Elazar: Regarding statistics, I've been doing this and thinking about it since before profilers were born, and recently I found what I think is a very useful concept, the [Rule of Succession](http://en.wikipedia.org/wiki/Rule_of_succession). Also, I know it's counterintuitive, but high sampling frequency, even if it's at random times, is not materially more helpful in finding problems. Regarding your use case, you make an excellent point. I didn't think of the case of not having source. – Mike Dunlavey Feb 09 '11 at 14:10