Questions tagged [slurm]

Slurm (formerly spelled SLURM) is an open-source resource manager designed for Linux HPC clusters of all sizes.

Slurm: A Highly Scalable Resource Manager

Slurm is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

Slurm's design is very modular with dozens of optional plugins. In its simplest configuration, it can be installed and configured in a couple of minutes (see Caos NSA and Perceus: All-in-one Cluster Software Stack by Jeffrey B. Layton) and was used by Intel on their 48-core "cluster on a chip". More complex configurations can satisfy the job scheduling needs of world-class computer centers and rely upon a MySQL database for archiving accounting records, managing resource limits by user or bank account, or supporting sophisticated job prioritization algorithms.

While other resource managers do exist, Slurm is unique in several respects:

It is designed to operate in a heterogeneous cluster counting over 100,000 nodes and millions of processors.
It can sustain a throughput rate of hundreds of thousands jobs per hour with bursts of job submissions at several times that rate.
Its source code is freely available under the GNU General Public License.
It is portable; written in C and using the GNU autoconf configuration engine. While initially written for Linux, other UNIX-like operating systems should be easy porting targets.
It is highly tolerant of system failures, including failure of the node executing its control functions.
A plugin mechanism exists to support various interconnects, authentication mechanisms, schedulers, etc. These plugins are documented and simple enough for the motivated end user to understand the source and add functionality.
Configurable node power control functions allow putting idle nodes into a power-save/power-down mode. This is especially useful for "elastic burst" clusters which expand dynamically to a cloud virtual machine (VM) provider to accommodate workload bursts.

Resources and Tutorials:

Name Spelling

As of v18.08, the name spelling “SLURM” has been changed to “Slurm” (commit 3d7ada78e).

1145 questions

votes

2 answers

How to activate a specific Python environment as part of my submission to Slurm?

I want to run a script on cluster (SBATCH file). How can active my virtual environment (path/to/env_name/bin/activate). Does i need only to add: module load python/2.7.14 source "/pathto/Python_directory/ENV2.7_new/bin/activate" in my_script.sh…

python slurm sbatch

asked Nov 29 '18 at 18:48

bib

votes

2 answers

SLURM sacct shows 'batch' and 'extern' job names

I have submitted a job to a SLURM queue, the job has run and completed. I then check the completed jobs using the sacct command. But looking at the results of the sacct command I notice additional results that I did not expect: JobID …

slurm

asked Sep 21 '18 at 16:14

Parsa

2,485
2
14
30

votes

1 answer

Running a binary without a top level script in SLURM

In SGE/PBS, I can submit binary executables to the cluster just like I would locally. For example: qsub -b y -cwd echo hello would submit a job named echo, which writes the word "hello" to its output file. How can I submit a similar job to SLURM.…

bash cluster-computing pbs sungridengine slurm

asked Oct 28 '15 at 20:14

highBandWidth

14,815
16
74
126

votes

1 answer

How to change how frequently SLURM updates the output file (stdout)?

I am using SLURM to dispatch jobs on a supercomputer. I have set the --output=log.out option to place the content from a job's stdout into a file (log.out). I'm finding that the file is updated every 30-60 minutes, making it difficult for me to…

stdout slurm

asked Aug 06 '14 at 21:44

Neal Kruis

1,805
2
24
45

votes

0 answers

Unable to setup slurmdbd plugin: Connection refused

Unable to setup slurmdbd plugin. The SLURM installation works fine Set AccountingStorageType=accounting_storage/slurmdbd in the /etc/slurm/slurm.conf When I do sacctmgr list cluster it gives: sacctmgr: error: slurm_persist_conn_open_without_init:…

slurm

asked Mar 04 '19 at 11:08

Leander

votes

1 answer

Installing/emulating SLURM on an Ubuntu 16.04 desktop: slurmd fails to start

Edit What I am really looking for is a way to emulate SLURM, something interactive and reasonably user-friendly that I can install. Original post I want to test drive some minimal examples with SLURM, and I am trying to install it all on a local…

ubuntu ubuntu-16.04 slurm

asked Oct 27 '17 at 03:04

landau

4,594
15
36

votes

0 answers

Get stdout/stderr from a slurm job at runtime

I have a batch file to send a job with sbatch. The contents of the batch file is # Setting the proper SBATCH variables ... #SBATCH --error="test_slurm-%j.err" #SBATCH --output="test_slurm-%j.out" ... WORKDIR=. echo "Run…

stdout slurm

asked Jun 09 '17 at 17:18

sancho.s ReinstateMonicaCellio

12,097
14
65
141

votes

2 answers

Is it possible to run SLURM jobs in the background using SRUN instead of SBATCH?

I was trying to run slurm jobs with srun on the background. Unfortunately, right now due to the fact I have to run things through docker its a bit annoying to use sbatch so I am trying to find out if I can avoid it all together. From my…

python slurm sbatch

asked Feb 10 '17 at 18:39

Charlie Parker

13,522
35
118
206

votes

3 answers

SLURM sbatch job array for the same script but with different input arguments run in parallel

I have a problem where I need to launch the same script but with different input arguments. Say I have a script myscript.py -p -i , where I need to consider N different par_values (between x0 and x1) and M trials for each value…

arrays slurm sbatch

asked Jan 27 '17 at 18:23

maurizio

votes

1 answer

Sbatch: pass job name as input argument

I have the following script to submit job with slurm: #!/bin/sh #!/bin/bash #SBATCH -J $3 #job_name #SBATCH -n 1 #Number of processors #SBATCH -p CA nwchem $1 > $2 The first argument ($1) is my input, the second ($2) is my output and I would…

slurm

asked Mar 29 '16 at 08:31

Laetis

votes

2 answers

How can I get detailed job run info from SLURM (e.g. like that produced for "standard output" by LSF)?

When using bsub with LSF, the -o option gave a lot of details such as when the job started and ended and how much memory and CPU time the job took. With SLURM, all I get is the same standard output that I'd get from running a script without LSF. For…

slurm lsf

asked Apr 28 '15 at 20:08

Christopher Bottoms

10,220
7
44
87

votes

1 answer

seq uses comma as decimal separator

I have noticed a strange seq behavior on one of my computers (Ubuntu LTS 14.04): instead of using points as decimal separator it is using commas: seq 0. 0.1 0.2 0,0 0,1 0,2 The same version of seq (8.21) on my other PC gives the normal points (also…

bash ubuntu ssh seq slurm

asked May 27 '14 at 08:50

Miguel

7,027
1
21
40

votes

2 answers

How to configure the content of slurm notification emails?

Slurm can notify the user by email when certain types of events occur using options such as --mail-type and --mail-user. The emails I receive this way contain a void body and a title that looks like : SLURM Job_id=9228 Name=toto Ended, Run time…

slurm

asked Oct 26 '18 at 07:04

Johann Bzh

votes

3 answers

How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs. Thanks.

gpu nvidia slurm sbatch

asked May 14 '17 at 18:22

Negelis

votes

1 answer

Slurm server with a asterisk near the "idle"

I'm using Slurm. When I run sinfo -Nel it is common to see a server designated as idle, but sometimes there is also a little asterisk near it (Like this: idle*). What does that mean? I couldn't find any info about that. (The server is up and…

slurm

asked Aug 09 '15 at 10:51

ZoRo

Prev 1 2

…

76 77 Next