Slurm Batch system usage

Slurm Batch system usage

Simple job submission and the concept of queues (partitions)

On the Tier-3 you run jobs by submitting them to the Slurm job scheduler. The clusters main computational resources, the worker nodes, are only accessible through this batch system.

The login nodes t3ui01-03 (User Interface or UI nodes) provide you an environment to compose, test and submit batch jobs to Slurm.

We provide the following partitions (often also called batch queues) to which you can submit jobs

PARTITION	NODES(A/I/O/T)	CPUS(A/I/O/T)	TIMELIMIT	DEFAULTTIME	GRES	NODELIST
short	0/29/0/29	0/1520/0/1520	1:00:00	45:00	(null)	t3wn[30-33,35-36,38-44,46,48,50-51,53,56,58-63,70-73]
standard*	0/27/0/27	0/1456/0/1456	12:00:00	12:00:00	(null)	t3wn[30-33,35-36,38-44,50-51,53,56,58-63,70-73]
long	0/19/0/19	0/624/0/624	7-00:00:00	1-00:00:00	(null)	t3wn[30-33,35-36,38-44,50-51,53,56,58-59]
qgpu	0/2/0/2	0/80/0/80	1:00:00	30:00	gpu:8	t3gpu[01-02]
gpu	0/2/0/2	0/80/0/80	7-00:00:00	1-00:00:00	gpu:8	t3gpu[01-02]

A/I/O/T means active/idle/offline/total, so the fourth number is the number of total nodes or CPUs. You can get the above list also by running the following command on one of the UI nodes.

sinfo -o "%.12P %.16F %.16C %.14l %.16L %.12G %N"

To launch a batch job, you first prepare a batch script (usually a normal bash script that launches your executable), e.g. my-script.sh and you then use the sbatch command to submit it to Slurm. Here we submit to the short queue and we use the account t3 for normal CPU jobs.

sbatch -p short --account=t3 my-script.sh

For GPU jobs on the gpu queues you need to use the gpu_gres account

sbatch -p qgpu --account=gpu_gres my-gpu-script.sh

The sbatch command supports a lot of additional configuration options (refer to its man page), e.g. you may want to specify the memory requirements of your job in MBs.

sbatch -p standard --account=t3 --mem=3000 job.py

Instead of passing all these options on the command line you can also put them inside of the batch script (usually in the header part), starting lines with the #SBATCH comment part, e.g.

# This is my batch script
#SBATCH --mem=3000
#SBATCH --account=t3
#SBATCH --time=04:00:00
#SBATCH --partition=standard

# now start our executable
myexecutable

Example Job submission scripts

GPU Example

CPU Example

CPU Example for using multiple processors (threads) on a single physical computer

Here are some useful commands to check Slurm jobs and nodes status and T3 Slurm Monitoring page.

The detailed slurm configuration can be examined on any Slurm node by listing the configuration file /etc/slurm/slurm.conf.

Slurm itself calculates priorities of jobs taking into account
- FairShare: past cluster usage by the user (part of a decay function)
- Age of Job: time the job has been waiting in the queue
- Job Size: size of resource request CPU, Memory

The default memory per job slot (memory-per-cpu) is slightly below 2GB/CPU, given by the oldes nodes.

Slurm FAQ

Is there is a way to increase the maximum time of a job while it is running?

Jobs can in general be modified by the "scontrol update" command. A job that is still in the queue can be updated e.g. with

 scontrol update jobid=7798 TimeLimit=48:00:00
 scontrol update jobid=7798 partition=long

But as soon as the job is running, only an admin user is allowed to change settings. The reason is easily explained: The maximal runtime of a job is used in fitting the job into "holes" within the scheduling plan. So, if users were allowed to extend the runtime, they could start submitting them with super-short times, and once they run, increase the time.

Topic revision: r35 - 2024-01-08 - DerekFeichtinger

CmsTier3

User Pages
Main Page
Policies

Physics Groups
Steering Board Meetings

Admin Pages
AdminArea
Cluster Specs