Quite a few aspects of your question are unclear, such as the meaning of “submit job MyJob in currentDirectory using myJobShellFile.sh to a que”, “append this line to
'mpiexec ./main input myJobDataFolder 0003'”, how you detect when a job is done, relevant parts of myJobShellFile.sh
, and some other details. If you can list the specific shell commands you use in each iteration of job submission, then you can post a better question, with a bash
tag instead of python.
In the following script, I put a ###
at the end of any line where I am guessing what you are talking about. Lines ending with ###
may be irrelevant to whatever you actually do, or may be pseudocode. Anyway, the general idea is that the script is supposed to do the things you listed in your items 1 to 5. This script assumes that you have modified myJobShellFile.sh
to say
mpiexec ./main input $1 $2
instead of
mpiexec ./main input
because it is simpler to use parameters to modify what you tell mpiexec
than it is to keep modifying a shell script. Also, it seems to me you would want to increment maxIter before submitting next job, instead of after. If so, remove the # from the t=$((1$maxIter+1)); maxIter=${t#1}
line. Note, see the “Parameter Expansion” section of man bash re expansion of the ${var#txt}
form, and the “Arithmetic Expansion” section re $((expression))
form. The 1$maxIter
and similar forms are used to change text like 0018
(which is not a valid bash number because 8 is not an octal digit) to 10018
.
#!/bin/sh
./myJobShellFile.sh MyJob ###
maxIter=0
while true; do
waitforjobcompletion ###
cd ./myJobDataFolder
maxFile= $(ls myJobData* | tail -1)
maxIter= ${maxFile#myJobData.} #Get max extension
# If you want to increment maxIter, uncomment next line
# t=$((1$maxIter+1)); maxIter=${t#1}
cd ..
if [[ 1$maxIter -lt 11000 ]] ; then
./myJobShellFile.sh MyJobDataFolder $maxIter
else
break
fi
done
Notes: (1) To test with smaller runs than 1000 submissions, replace 11000 by 10000+n; for example, to do 123 runs, replace it with 10123. (2) In writing the above script, I assumed that not-previously-known numbers of output files appear in the output directory from time to time. If instead exactly one output file appears per run, and you just want to do one run per value for the values 0000, 0001, 0002, 0999, 1000, then use a script like the following. (For testing with a smaller number than 1000, replace 1000 with (eg) 0020. The leading zeroes in these numbers tell bash to fill the generated numbers with leading zeroes.)
#!/bin/sh
for iter in {0000..1000}; do
./myJobShellFile.sh MyJobDataFolder $iter
waitforjobcompletion ###
done
(3) If the system has a command that sleeps while it waits for a job to complete on the supercomputing resource, it is reasonable to use that command in place of waitforjobcompletion
in the above scripts. Otherwise, if the system has a command jobisrunning
that returns true if a job is still running, replace waitforjobcompletion
with something like the following:
while jobisrunning ; do sleep 15; done
This will run the jobisrunning command; if it returns true, the shell will sleep for 15 seconds and then retest. Here is an example that illustrates waiting for a file to appear and then for it to go away:
while [ ! -f abc ]; do sleep 3; echo no abc; done
while ls abc >/dev/null 2>&1; do sleep 3; echo an abc; done
The second line's test could be [ -f abc ]
instead; I showed a longer example to illustrate how to suppress output and error messages by routing them to /dev/null. (4) To reverse the sense of a while
statement's test, replace the word while
with until
. For example, while [ ! -f abc ]; ...
is equivalent to until [ -f abc ]; ...
.