batch job submission upon completion of job

Question

I would like to write a script to execute the steps outlined below. If someone can provide simple examples on how to modify files and search through folders using a script (not necessarily solving my problem below), I will greatly appreciate it.

submit job MyJob in currentDirectory using myJobShellFile.sh to a queue
upon completion of MyJob, goto to currentDirectory/myJobDataFolder. In myJobDataFolder, there are folders

myJobData.0000 myJobData.0001 myJobData.0002 myJobData.0003

I want to find the maximum number maxIteration of all the listed folders. Here it would be maxIteration=0003.\
In file myJobShellFile.sh, at the last line says

mpiexec ./main input myJobDataFolder

I want to append this line to

'mpiexec ./main input myJobDataFolder 0003'
I want to submit MyJob to the que while maxIteration < 10
Upon completion of MyJob, find the new maxIteration and change this number in myJobShellFile.sh and goto step 4.

I think people write python scripts typically to do this stuff, but am having a hard time finding out how. I probably don't know the correct terminology for this procedure. I am also aware that the script will vary slightly depending on the queing system, but any help will be greatly appreciated.

I think that for job control more people write bash scripts than python scripts. What OS are you using? Which of the following do you know how to write? : python, bash, awk, perl — James Waldby - jwpat7, Nov 29 '12 at 19:59
@jwpat7, I am using submitting jobs to a supercomputing resource. The OS is Linux RedHat. I do not know python, awk, or perl. I know the basics of bash like cd, find, etc. I do know C++ and Matlab, but I don't think that will be helpful for me. If I can get a general idea of how to edit files with a script and search for files, I think I can figure out the rest. Thank you. — namu, Nov 29 '12 at 21:19

James Waldby - jwpat7 · Accepted Answer · 2012-11-30T01:08:18.693

Quite a few aspects of your question are unclear, such as the meaning of “submit job MyJob in currentDirectory using myJobShellFile.sh to a que”, “append this line to 'mpiexec ./main input myJobDataFolder 0003'”, how you detect when a job is done, relevant parts of myJobShellFile.sh, and some other details. If you can list the specific shell commands you use in each iteration of job submission, then you can post a better question, with a bash tag instead of python.

In the following script, I put a ### at the end of any line where I am guessing what you are talking about. Lines ending with ### may be irrelevant to whatever you actually do, or may be pseudocode. Anyway, the general idea is that the script is supposed to do the things you listed in your items 1 to 5. This script assumes that you have modified myJobShellFile.sh to say
mpiexec ./main input $1 $2
instead of
mpiexec ./main input
because it is simpler to use parameters to modify what you tell mpiexec than it is to keep modifying a shell script. Also, it seems to me you would want to increment maxIter before submitting next job, instead of after. If so, remove the # from the t=$((1$maxIter+1)); maxIter=${t#1} line. Note, see the “Parameter Expansion” section of man bash re expansion of the ${var#txt} form, and the “Arithmetic Expansion” section re $((expression)) form. The 1$maxIter and similar forms are used to change text like 0018 (which is not a valid bash number because 8 is not an octal digit) to 10018.

#!/bin/sh
./myJobShellFile.sh MyJob    ###
maxIter=0
while true; do 
   waitforjobcompletion      ###
   cd ./myJobDataFolder
   maxFile= $(ls myJobData* | tail -1)
   maxIter= ${maxFile#myJobData.}  #Get max extension
   # If you want to increment maxIter, uncomment next line
   # t=$((1$maxIter+1)); maxIter=${t#1} 
   cd ..
   if [[ 1$maxIter -lt 11000 ]] ; then
      ./myJobShellFile.sh MyJobDataFolder $maxIter
   else
      break
   fi
done

Notes: (1) To test with smaller runs than 1000 submissions, replace 11000 by 10000+n; for example, to do 123 runs, replace it with 10123. (2) In writing the above script, I assumed that not-previously-known numbers of output files appear in the output directory from time to time. If instead exactly one output file appears per run, and you just want to do one run per value for the values 0000, 0001, 0002, 0999, 1000, then use a script like the following. (For testing with a smaller number than 1000, replace 1000 with (eg) 0020. The leading zeroes in these numbers tell bash to fill the generated numbers with leading zeroes.)

#!/bin/sh
for iter in {0000..1000}; do
   ./myJobShellFile.sh MyJobDataFolder $iter
   waitforjobcompletion      ###
done

(3) If the system has a command that sleeps while it waits for a job to complete on the supercomputing resource, it is reasonable to use that command in place of waitforjobcompletion in the above scripts. Otherwise, if the system has a command jobisrunning that returns true if a job is still running, replace waitforjobcompletion with something like the following:

while jobisrunning ; do sleep 15; done

This will run the jobisrunning command; if it returns true, the shell will sleep for 15 seconds and then retest. Here is an example that illustrates waiting for a file to appear and then for it to go away:

while [ ! -f abc ]; do sleep 3; echo no abc; done
while ls abc >/dev/null 2>&1; do sleep 3; echo an abc; done

The second line's test could be [ -f abc ] instead; I showed a longer example to illustrate how to suppress output and error messages by routing them to /dev/null. (4) To reverse the sense of a while statement's test, replace the word while with until. For example, while [ ! -f abc ]; ... is equivalent to until [ -f abc ]; ....

Thank you! This is along the lines of what I was looking for. I think I should be able to figure out the rest. Also, I realized that the `while loop` will constantly be running in the background, so I should put a pause in there so that it only checks every X minutes. — namu, Nov 29 '12 at 22:48
`waitforjobcompletion` is the thing that will stall the while loop until the job completes. How do you check job completion? — James Waldby - jwpat7, Nov 29 '12 at 22:50
The queueing system has the command `-l depend=JOBID ` which is the dependency upon completion of another job. `JOBID` is the JOBID for the job that must complete first. I should be able to write a script that based on the name of the job `MyJob` in the queue, it is able to retrive `JOBID` by using commands provided by the queueing system. — namu, Nov 29 '12 at 22:53
Does it have a command that waits until the job is done, or a command that tells you the current status? If the former, ok. If the latter, you might need an inner while loop (around just waitforjobcompletion and a sleep 15 seconds or whatever). Also see `inotify` for sleeping until files appear in a directory. Eg see an [inotify](http://stackoverflow.com/a/7542602/837847) answer. Re fi, bash's `if` form is `if list; then list; [ elif list; then list; ] ... [ else list; ] fi` — James Waldby - jwpat7, Nov 29 '12 at 23:02

batch job submission upon completion of job

1 Answers1