25

I have a couple of thousand jobs to run on a SLURM cluster with 16 nodes. These jobs should run only on a subset of the available nodes of size 7. Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded. Therefore, multiple jobs should run at the same time on a single node. None of the tasks should spawn over multiple nodes.

Currently I submit each of the jobs as follow:

sbatch --nodelist=myCluster[10-16] myScript.sh

However this parameter makes slurm to wait till the submitted job terminates, and hence leaves 3 nodes completely unused and, depending on the task (multi- or single-threaded), also the currently active node might be under low load in terms of CPU capability.

What are the best parameters of sbatch that force slurm to run multiple jobs at the same time on the specified nodes?

Amir
  • 9,058
  • 9
  • 39
  • 68
Faber
  • 1,268
  • 2
  • 12
  • 19

3 Answers3

38

You can work the other way around; rather than specifying which nodes to use, with the effect that each job is allocated all the 7 nodes, specify which nodes not to use:

sbatch --exclude=myCluster[01-09] myScript.sh

and Slurm will never allocate more than 7 nodes to your jobs. Make sure though that the cluster configuration allows node sharing, and that your myScript.sh contains #SBATCH --ntasks=1 --cpu-per-task=n with n the number of threads of each job.

damienfrancois
  • 39,477
  • 7
  • 71
  • 82
  • This assuming you are not the administrator. Otherwise limits and associations are the way to go. – damienfrancois Oct 08 '14 at 04:28
  • With 'associations' do you mean 'reservations' in SLURM vocabulary? – Faber Oct 08 '14 at 15:15
  • No I mean [associations](http://slurm.schedmd.com/accounting.html) which is term Slurm uses in the context of accounts, quality of services, partitions, etc. to set limits. – damienfrancois Oct 08 '14 at 18:28
  • I am having trouble with the syntax `=myCluster[01-09]` :( which are the distinct node names in this case? – pcko1 May 15 '19 at 08:23
  • 1
    `--exclude=myCluster[01-09]` is equivalent to `--exclude=myCluster01,myCluster02,myCluster03,myCluster04,myCluster05,myCluster07,myCluster08,myCluster09,myCluster10,`. – damienfrancois May 15 '19 at 08:25
  • @damienfrancois is it possible to get the name of the nodes as in PBS. i.e. do the equivalent `cat $PBS_NODEFILE > machinefile` – Alexander Cska May 18 '19 at 14:36
  • The variable `SLURM_JOB_NODELIST` holds the list of nodes. (not the path to a file that contains the list of nodes) – damienfrancois May 18 '19 at 18:59
2

Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded.

I understand that you want the single-threaded jobs to share a node, whereas the parallel ones should be assigned a whole node exclusively?

multiple jobs should run at the same time on a single node.

As far as my understanding of SLURM goes, this implies that you must define CPU cores as consumable resources (i.e., SelectType=select/cons_res and SelectTypeParameters=CR_Core in slurm.conf)

Then, to constrain parallel jobs to get a whole node you can either use --exclusive option (but note that partition configuration takes precedence: you can't have shared nodes if the partition is configured for exclusive access), or use -N 1 --tasks-per-node="number_of_cores_in_a_node" (e.g., -N 1 --ntasks-per-node=8).

Note that the latter will only work if all nodes have the same number of cores.

None of the tasks should spawn over multiple nodes.

This should be guaranteed by -N 1.

Riccardo Murri
  • 1,015
  • 1
  • 10
  • 18
  • Crucial is that all my jobs use in total not more than 7 nodes. Each node of our cluster has 20 cores and 2 threads per core. If I understand you correctly you propose to submit parallel jobs with `sbatch --nodelist=myCluster[10-16] --ntasks-per-node=40 -N 1 myScript.sh`. Why not `--ntasks-per-node=1`, to make sure that not more than one job runs at the same time on a single node? What about the single threaded jobs? – Faber Oct 07 '14 at 10:52
  • 1
    @Faber If you want to confine a set of jobs to use a maximum of 7 nodes in total, than a partition or a QoS setting would be the way to go. – Riccardo Murri Oct 07 '14 at 11:45
1

Actually I think the way to go is setting up a 'reservation' first. According to this presentation http://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf (last slide).

Scenario: Reserve ten nodes in the default SLURM partition starting at noon and with a duration of 60 minutes occurring daily. The reservation will be available only to users alan and brenda.

scontrol create reservation user=alan,brenda starttime=noon duration=60 flags=daily nodecnt=10
Reservation created: alan_6

scontrol show res
ReservationName=alan_6 StartTime=2009-02-05T12:00:00
    EndTime=2009-02-05T13:00:00 Duration=60 Nodes=sun[000-003,007,010-013,017] NodeCnt=10 Features=(null) PartitionName=pdebug Flags=DAILY Licenses=(null)
    Users=alan,brenda Accounts=(null)

# submit job with:
sbatch --reservation=alan_6 myScript.sh

Unfortunately I couldn't test this procedure, probaly due to a lack of privileges.

Faber
  • 1,268
  • 2
  • 12
  • 19
  • 2
    A reservation will prevent *any other user from running* on the same set of nodes, that's why an admin is needed to create it. Is this what you really want? Reserve nodes for your exclusive access? – Riccardo Murri Oct 07 '14 at 13:13
  • Well that's what we agreed on among the (few) users. If we can set a max duration, why not? Or is this approach a complete anti-pattern for cluster usage? – Faber Oct 07 '14 at 13:26
  • Is it possible give regular users permission to set up reservations? – Faber Oct 07 '14 at 13:32