SlurmUsage < CmsTier3

<!-- keep this as a security measure:
   #uncomment if the subject should only be modifiable by the listed groups 
   # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup
   # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup
   #uncomment this if you want the page only be viewable by the listed groups
   # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup,Main.CMSAdminReaderGroup
-->

%TOC%

---+ Slurm Batch system usage
---++ Simple job submission and the concept of queues (partitions)
On the Tier-3 you run jobs by submitting them to the Slurm job scheduler. The clusters main computational
resources, the worker nodes, are only accessible through this batch system. 

The login nodes t3ui01-03  (User Interface or UI nodes) provide you an environment to compose, test and submit batch jobs to Slurm.

We provide the following *partitions* (often also called *batch queues*) to which you can submit jobs

|   PARTITION|  NODES(A/I/O/T)|   CPUS(A/I/O/T)|     TIMELIMIT|     DEFAULTTIME|        GRES|NODELIST|
|       short|       0/29/0/29|   0/1520/0/1520|       1:00:00|           45:00|      (null)|t3wn[30-33,35-36,38-44,46,48,50-51,53,56,58-63,70-73]|
|   standard*|       0/27/0/27|   0/1456/0/1456|      12:00:00|        12:00:00|      (null)|t3wn[30-33,35-36,38-44,50-51,53,56,58-63,70-73]|
|        long|       0/19/0/19|     0/624/0/624|    7-00:00:00|      1-00:00:00|      (null)|t3wn[30-33,35-36,38-44,50-51,53,56,58-59]|
|        qgpu|         0/2/0/2|       0/80/0/80|       1:00:00|           30:00|       gpu:8|t3gpu[01-02]|
|         gpu|         0/2/0/2|       0/80/0/80|    7-00:00:00|      1-00:00:00|       gpu:8|t3gpu[01-02]|

*A/I/O/T* means active/idle/offline/total, so the fourth number is the number of total nodes or CPUs. You can get the above list also by running the following command on one of the UI nodes.

<pre>
sinfo -o "%.12P %.16F %.16C %.14l %.16L %.12G %N"
</pre>

To launch a batch job, you first prepare a batch script (usually a normal bash script that launches your executable), e.g. =my-script.sh= and you then use the =sbatch= command to submit it to Slurm. Here we submit to the =short= queue and we use the account =t3= for normal CPU jobs.

<pre>
sbatch -p short --account=t3 my-script.sh
</pre>

For GPU jobs on the gpu queues you need to use the =gpu_gres= account

<pre>
sbatch -p qgpu --account=gpu_gres my-gpu-script.sh
</pre>


The =sbatch= command supports a lot of additional configuration options (refer to its man page), e.g. you may want to specify the memory requirements of your job in MBs.
<pre>
sbatch -p standard --account=t3 --mem=3000 job.py 
</pre>

Instead of passing all these options on the command line you can also put them inside of the batch script (usually in the header part), starting lines with the =#SBATCH= comment part, e.g.

<pre>
# This is my batch script
#SBATCH --mem=3000
#SBATCH --account=t3
#SBATCH --time=04:00:00
#SBATCH --partition=standard

# now start our executable
myexecutable
</pre>


---++ Example Job submission scripts
[[GPU Example][GPU Example]]

[[CPU Example][CPU Example]]

[[CPU Example  for using multiple processors (threads) on a single physical computer][CPU Example  for using multiple processors (threads) on a single physical computer]]

Here are some [[SlurmMonitoringCommands][useful commands to check Slurm jobs and nodes status]] and [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/SlurmUtilisation][T3 Slurm Monitoring]] page.

The detailed slurm configuration can be examined on any Slurm node by listing the configuration file =/etc/slurm/slurm.conf=.

Slurm itself calculates *priorities of jobs* taking into account  </br>
-  *FairShare*: past cluster usage  by the user (part of a decay function) </br>
-  *Age of Job*: time the job has been waiting in the queue </br>
-  *Job Size*: size of resource request  CPU, Memory </br>

The default memory per job slot (memory-per-cpu) is slightly below 2GB/CPU, given by the oldes nodes.

---++ Slurm FAQ
---+++ Is there is a way to increase the maximum time of a job while it is running?

Jobs can in general be modified by the "scontrol update" command. A job that is still in the queue can be updated e.g. with <pre>
 scontrol update jobid=7798 TimeLimit=48:00:00
 scontrol update jobid=7798 partition=long
</pre>

But as soon as the job is running, only an admin user is allowed to change settings. The reason is easily explained: The maximal runtime of a job is used in fitting the job into "holes" within the scheduling plan. So, if users were allowed to extend the runtime, they could start submitting them with super-short times, and once they run, increase the time.
This topic: CmsTier3 > WebHome > SlurmUsage
Topic revision: r35 - 2024-01-08 - DerekFeichtinger