33

What might cause a C, MPI program using a library called SUNDIALS/CVODE (a numerical ODE solver) running on a Gentoo Linux cluster to give me repeated Signal 15 received.?

Is that code being issued by MPI, Sundials, Linux, C or who?

Note that I am pretty much a beginner with the following technologies: C, MPI, SUNDIALS/CVODE, and Linux.

I can find nothing that seems related by googling the message. I don't even know where to begin to look. (This is one of those questions where "anything helps" is to be taken quite literally.)

(As an aside/afterthought why doesn't Chrome's dictionary recognize the word "googling"?).

Jeff
  • 667
  • 2
  • 7
  • 17

1 Answers1

59

This indicates the linux has delivered a SIGTERM to your process. This is usually at the request of some other process (via kill()) but could also be sent by your process to itself (using raise()). This signal requests an orderly shutdown of your process.

If you need a quick cheatsheet of signal numbers, open a bash shell and:

$ kill -l
 1) SIGHUP   2) SIGINT   3) SIGQUIT  4) SIGILL
 5) SIGTRAP  6) SIGABRT  7) SIGBUS   8) SIGFPE
 9) SIGKILL 10) SIGUSR1 11) SIGSEGV 12) SIGUSR2
13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGSTKFLT
17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG  24) SIGXCPU
25) SIGXFSZ 26) SIGVTALRM   27) SIGPROF 28) SIGWINCH
29) SIGIO   30) SIGPWR  31) SIGSYS  34) SIGRTMIN
35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3  38) SIGRTMIN+4
39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12
47) SIGRTMIN+13 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7  58) SIGRTMAX-6
59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX    

You can determine the sender by using an appropriate signal handler like:

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>

void sigterm_handler(int signal, siginfo_t *info, void *_unused)
{
  fprintf(stderr, "Received SIGTERM from process with pid = %u\n",
      info->si_pid);
  exit(0);
}

int main (void)
{
  struct sigaction action = {
    .sa_handler = NULL,
    .sa_sigaction = sigterm_handler,
    .sa_mask = 0,
    .sa_flags = SA_SIGINFO,
    .sa_restorer = NULL
  };

  sigaction(SIGTERM, &action, NULL);
  sleep(60);

  return 0;
}

Notice that the signal handler also includes a call to exit(). It's also possible for your program to continue to execute by ignoring the signal, but this isn't recommended in general (if it's a user doing it there's a good chance it will be followed by a SIGKILL if your process doesn't exit, and you lost your opportunity to do any cleanup then).

FatalError
  • 47,677
  • 13
  • 93
  • 113
  • This list isn't very helpful. Any idea how I can determine who is sending me the SIGTERM? The only thing I can say is my code isn't doing it (because I don't know how to send a SIGTERM). – Jeff May 24 '13 at 04:38
  • 1
    @Jeff: I've updated my post with a snippet that should help you determine the sender's pid. To see it in action you can run that code and from another shell run "kill " to see it print the sender's pid. – FatalError May 24 '13 at 12:32
  • Thanks for the code @FatalError (cool name, btw). I get that this will show me the PID of the SIGTERMing process and I compiled it. But how do I use it? Is it supposed to be a standalone program running in the background while I run my program? If so, how do I kick it off? Do I need to run this program on each node of the cluster (oops, did I forget to mention that my program is running on a cluster?)? Suppose I'm using `screen`, does it need to run in each screen? I know, a lotta questions, but I'm lost. – Jeff May 24 '13 at 16:53
  • @Jeff: That code is just a demonstration. For your code, you should stick the sigaction() call somewhere near the start of your program and of course you'll need the signal handler function too. Then, do whatever you do to get the SIGTERM normally. At this point, your program will spit out the pid of whoever sent it. Then check `ps -Af` on that node and you can trick down exactly what is sending it. – FatalError May 24 '13 at 17:29
  • 1
    @FE When I added your code to my program, I got this compile warning: `warning: missing braces around initializer`. It ran OK, but the message didn't appear when I got a Signal 15 message. Will this work when the program is running in the background (using `&` to start it)? – Jeff May 25 '13 at 03:34