Slurm Batch system usage
Simple job submission and the concept of queues (partitions)
On the Tier-3 you run jobs by submitting them to the Slurm job scheduler. The clusters main computational
resources, the worker nodes, are only accessible through this batch system.
The login nodes t3ui01-03 (User Interface or UI nodes) provide you an environment to compose, test and submit batch jobs to Slurm.
We provide the following
partitions (often also called
batch queues) to which you can submit jobs
PARTITION |
NODES(A/I/O/T) |
CPUS(A/I/O/T) |
TIMELIMIT |
DEFAULTTIME |
GRES |
NODELIST |
qgpu |
0/2/0/2 |
0/80/0/80 |
1:00:00 |
30:00 |
gpu:8(S:0-1) |
t3gpu[01-02] |
gpu |
0/2/0/2 |
0/80/0/80 |
7-00:00:00 |
1-00:00:00 |
gpu:8(S:0-1) |
t3gpu[01-02] |
short |
0/32/0/32 |
0/1680/0/1680 |
1:00:00 |
45:00 |
(null) |
t3wn[30-33,35-36,38-44,46,48-54,56,58-63,70-73] |
standard* |
0/32/0/32 |
0/1680/0/1680 |
12:00:00 |
12:00:00 |
(null) |
t3wn[30-33,35-36,38-44,46,48-54,56,58-63,70-73] |
long |
0/24/0/24 |
0/848/0/848 |
7-00:00:00 |
1-00:00:00 |
(null) |
t3wn[30-33,35-36,38-44,46,48-54,56,58-59] |
A/I/O/T means active/idle/offline/total, so the fourth number is the number of total nodes or CPUs. You can get the above list also by running the following command on one of the UI nodes.
sinfo -o "%.12P %.16F %.16C %.14l %.16L %.12G %N"
To launch a batch job, you first prepare a batch script (usually a normal bash script that launches your executable), e.g.
my-script.sh
and you then use the
sbatch
command to submit it to Slurm. Here we submit to the
short
queue and we use the account
t3
for normal CPU jobs.
sbatch -p short --account=t3 my-script.sh=
The
sbatch
command supports a lot of additional configuration options (refer to its man page), e.g. you may want to specify the memory requirements of your job in MBs.
sbatch -p wn --account=t3 --mem=3000 job.py=
Instead of passing all these options on the command line you can also put them inside of the batch script (usually in the header part), starting lines with the
#SBATCH
comment part, e.g.
# This is my batch script
#SBATCH --mem=3000
#SBATCH --account=t3
#SBATCH --time=04:00:00
#SBATCH --partition=standard
# now start our executable
myexecutable
Example Job submission scripts
GPU Example
CPU Example
CPU Example for using multiple processors (threads) on a single physical computer
Here are some
useful commands to check Slurm jobs and nodes status and
T3 Slurm Monitoring page.
The detailed slurm configuration can be examined on any Slurm node by listing the configuration file
/etc/slurm/slurm.conf
.
Slurm itself calculates
priorities of jobs taking into account
-
FairShare: past cluster usage by the user (part of a decay function)
-
Age of Job: time the job has been waiting in the queue
-
Job Size: size of resource request CPU, Memory
The default memory per job slot (memory-per-cpu) is slightly below 2GB/CPU, given by the oldes nodes.