3

How to determine exactly what a piece of software is doing when it is stuck, unresponsive to user input and not updating its display?

I have tried oprofile, which records what function is executing, but it's not giving me enough clues. It counts everything that happens during the time it's running, when I need to see what's happening only when the specimen program is stuck.

The problem might involve interrupts, waiting on network sockets, timers, a GUI event handler, or who knows what. How to find out as much as possible about what's going on, not just the execution points of each thread?

The soffware of interest runs on Linux, built using gcc, mostly C++ but may involve other languages including interpreted ones e.g. Python.

The particular case of concern now is Firefox, for which I have checked out source. Firefox pauses all input and screen output at random times, frequently, for about 5-10 seconds each time. Even if someone handed me the solution to this particular problem on a silver platter, sure I'll take it but still be asking. If possible, I'd like to learn general techniques that would apply to any software, especially stuff I'm responsible for.

Mat
  • 188,820
  • 38
  • 367
  • 383
DarenW
  • 15,697
  • 7
  • 59
  • 96
  • One big difference between regular profiling and what I'm asking is that I want to diagnose delays that occur randomly, while profiling normally locates portions of code that reproducibly run slow. – DarenW Dec 03 '09 at 01:24

3 Answers3

2

strace will trace out the system calls. This might give some indication of what is blocking on network sockets and so on.

Jeff Foster
  • 40,078
  • 10
  • 78
  • 103
  • I've used strace for programs that crash. Is it useful for a running one which I don't want to kill? – DarenW Dec 03 '09 at 00:10
  • 1
    Use strace to start the application up, it'll intercept and record the system calls and shove them out to stdout (or with -o to a file), It'll run until you terminate it (assuming the program you're running doesn't crash). It should therefore be fine for a running one which you don't need to kill. – Jeff Foster Dec 03 '09 at 07:05
2

This technique should find it. Basically, while it's spending time like that, there's almost always a hierarchy of function calls on the stack waiting for their work to be completed. Just sample the stack a few times and you'll see them.

ADDED: As Don Wakefield pointed out, the pstack utility could be perfect for this job.

Community
  • 1
  • 1
Mike Dunlavey
  • 38,662
  • 12
  • 86
  • 126
  • 1
    And if you're on Linux, you don't even need the debugger. Just use [pstack](http://linux.die.net/man/1/pstack) – Don Wakefield Dec 02 '09 at 17:56
  • @Don: Thanks for the tip. It doesn't seem to show you source lines, but it still gets the job done. – Mike Dunlavey Dec 02 '09 at 18:17
  • 1
    Aw man, i gotta stop living in a cave! Haven't heard of this pstack til now... it may do the job. – DarenW Dec 03 '09 at 00:11
  • @DarenW: Me neither. My cave is cold. Can you email me some of that Florida air (without the cat fur)? – Mike Dunlavey Dec 03 '09 at 01:20
  • There's lsstack for Linux but unfortunately compiles and runs only on 32 bit; i'm running 64 bit. – DarenW Dec 03 '09 at 03:00
  • This has been a useful direction to pursue, but I'd like formulate what I've found as a fresh standalone answer. As for this Florida air, right now it's rather cool and rainy. You don't want this - unless you are in a deser? – DarenW Dec 03 '09 at 06:21
1

A stack trace can be obtained of a running program. At a command line, use "ps aux" to find the program's PID. Suppose it's 12345. Then run:

gdb ---pid=12345

When the program is stuck in a pause (or when doing anything suspicious), do a ctrl-C in gdb. The "bt" command in gdb prints the stack, which can be admired now or pasted into a text file for later study. Resume execution of the program with "c" (continue).

The main advantage of this manual technique over using oprofile or other profilers, is I can get the exact call sequence during a moment of interest. A few samples during times of trouble, and a few when the program is running normally, should give useful clues.

DarenW
  • 15,697
  • 7
  • 59
  • 96
  • I've tried the ctrl-C in **gdb** method under Windows, and not had luck. It seems to get to a place where there's no real stack. I wonder what I'm doing wrong. – Mike Dunlavey Dec 03 '09 at 15:41
  • ... I've made myself a pain in the xxx by explaining over and over why that technique works so well, such as: http://stackoverflow.com/questions/406760/whats-your-most-controversial-programming-opinion/1562802#1562802 – Mike Dunlavey Dec 03 '09 at 15:47
  • ... This example shows a 40x speedup: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773 – Mike Dunlavey Dec 03 '09 at 15:50