Slurm Batch system usage

Simple job submission and the concept of queues (partitions)

On the Tier-3 you run jobs by submitting them to the Slurm job scheduler. The clusters main computational resources, the worker nodes, are only accessible through this batch system.

The login nodes t3ui01-03 (User Interface or UI nodes) provide you an environment to compose, test and submit batch jobs to Slurm.

We provide the following partitions (often also called batch queues) to which you can submit jobs

PARTITION	NODES(A/I/O/T)	CPUS(A/I/O/T)	TIMELIMIT	DEFAULTTIME	GRES	NODELIST
qgpu	0/2/0/2	0/80/0/80	1:00:00	30:00	gpu:8(S:0-1)	t3gpu[01-02]
gpu	0/2/0/2	0/80/0/80	7-00:00:00	1-00:00:00	gpu:8(S:0-1)	t3gpu[01-02]
short	0/32/0/32	0/1680/0/1680	1:00:00	45:00	(null)	t3wn[30-33,35-36,38-44,46,48-54,56,58-63,70-73]
standard*	0/32/0/32	0/1680/0/1680	12:00:00	12:00:00	(null)	t3wn[30-33,35-36,38-44,46,48-54,56,58-63,70-73]
long	0/24/0/24	0/848/0/848	7-00:00:00	1-00:00:00	(null)	t3wn[30-33,35-36,38-44,46,48-54,56,58-59]

A/I/O/T means active/idle/offline/total, so the fourth number is the number of total nodes or CPUs. You can get the above list also by running the following command on one of the UI nodes.

sinfo -o "%.12P %.16F %.16C %.14l %.16L %.12G %N"

To launch a batch job, you first prepare a batch script (usually a normal bash script that launches your executable), e.g. my-script.sh and you then use the sbatch command to submit it to Slurm. Here we submit to the short queue and we use the account t3 for normal CPU jobs.

sbatch -p short --account=t3 my-script.sh=

The sbatch command supports a lot of additional configuration options (refer to its man page), e.g. you may want to specify the memory requirements of your job in MBs.

sbatch -p wn --account=t3 --mem=3000 job.py=

Instead of passing all these options on the command line you can also put them inside of the batch script (usually in the header part), starting lines with the #SBATCH comment part, e.g.

# This is my batch script
#SBATCH --mem=3000
#SBATCH --account=t3
#SBATCH --time=04:00:00
#SBATCH --partition=standard

# now start our executable
myexecutable

Example Job submission scripts

GPU Example

CPU Example

CPU Example for using multiple processors (threads) on a single physical computer

Here are some useful commands to check Slurm jobs and nodes status and T3 Slurm Monitoring page.

The detailed slurm configuration can be examined on any Slurm node by listing the configuration file /etc/slurm/slurm.conf.

Slurm itself calculates priorities of jobs taking into account
- FairShare: past cluster usage by the user (part of a decay function)
- Age of Job: time the job has been waiting in the queue
- Job Size: size of resource request CPU, Memory

The default memory per job slot (memory-per-cpu) is slightly below 2GB/CPU, given by the oldes nodes.

Topic revision: r31 - 2021-06-01 - DerekFeichtinger

CmsTier3

User Pages
Main Page
Policies

Physics Groups
Steering Board Meetings

Admin Pages
AdminArea
Cluster Specs