2

I am looking for some tools to profile where the time is spent. Have looked at oprofile, but that doesnt really give me what I need.

I was looking at callgrind, specifically using the CALLGRIND_START_INSTRUMENTATION and CALLGRIND_STOP_INSTRUMENTATION macros. I dont want the tool to slow down the app too much, like valgrind does in general. But that doesn't really work because Valgrind seems to seralize everything to one single thread.

For example, if fn A calls fb B which calls fn C, and back to B and A, I want to know how much time was spent where. I have some mutex tools that I am using, but a good time tool would be extremely useful to see where exactly is the time being spent, so that I can concentrate on those paths. Short of adding something myself, is there any tool I can use for this task? Its a C++ app btw. I cannot use valgrind because of its single threaded-ness in the kernel. Also, my app spends a bunch of time waiting, so plain CPU profilers are not really helping as much..

Mark Lobo
  • 311
  • 2
  • 3
  • 8
  • seems like a question that would have been answered multiple times already... – Mitch Wheat Apr 09 '12 at 03:35
  • I tried to find something, but couldn't. Maybe I was using bad search terms:) I saw a couple of references to callgrind and valgrind, but not much beyond that. Can you just point me in the right direction? Or if you have some tools you can suggest, that would be great! – Mark Lobo Apr 09 '12 at 03:43
  • Most of it is callgrind, which I cannot use because it single threads the entire app:( I dont see anything else in my Related section.. – Mark Lobo Apr 09 '12 at 03:47
  • *[Here's how I do it,](http://stackoverflow.com/a/378024/23771)* and *[here's an example.](http://stackoverflow.com/a/927773/23771)* – Mike Dunlavey Apr 10 '12 at 19:22

1 Answers1

0

You might care to take a look at point 3 of this post.

It suggests not asking where the time is spent, but why.

There is a qualitative difference between supposing that you are looking for some method that "spends too much time" versus asking (by studying stack samples, not summarizing them) what is the program actually trying to accomplish at a small sampling of time points.

That approach will find anything you can find by measuring methods, and a lot more. If applied repeatedly, it can result in large factors of speedup.

In a multi-thread situation, you can identify the threads that are not idle, and apply it to them.

Community
  • 1
  • 1
Mike Dunlavey
  • 38,662
  • 12
  • 86
  • 126
  • 1
    Thanks for the response Mike. I will definitely get to the "why", but to get to they why, I do need to find out "where". The application is a massively multithreaded app, with lots of things going on at the same time. I was really hoping this is a big enough problem that a tool can measure this rather than take periodic snapshots. Also, a snapshot will really not help from what I can understand, because the threads spend a bunch of time sleeping, and wont show up on the stack a lot, since this program is not at all CPU bound. Is that correct? – Mark Lobo Apr 18 '12 at 17:01
  • @Mark: You're in good company, thinking about it that way, but here's the thing. If a multi-thread app is slow, it's because one or more threads is spending too much time (a good fraction X) doing something Y. So if you snapshot it in time, just look at the threads that are either a) computing, or b) waiting for I/O they requested. The probability you will see it doing Y is X, so if you do this like 10 times, you will see it doing Y on 10*X occasions, on average. What measuring doesn't tell you is what Y is, and just looking at a high-time function doesn't tell you either. – Mike Dunlavey Apr 18 '12 at 21:24