<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup,Main.CMSAdminReaderGroup --> ---+ Slurm Batch system usage ---++ Simple job submission and the concept of queues (partitions) On the Tier-3 you run jobs by submitting them to the Slurm job scheduler. The clusters main computational resources, the worker nodes, are only accessible through this batch system. The login nodes t3ui01-03 (User Interface or UI nodes) provide you an environment to compose, test and submit batch jobs to Slurm. We provide the following *partitions* (often also called *batch queues*) to which you can submit jobs | *PARTITION* | *NODES(A/I/O/T)* | *CPUS(A/I/O/T)* | *TIMELIMIT* | *DEFAULTTIME* | *GRES* | *NODELIST* | | short | 0/32/0/32 | 0/1680/0/1680 | 1:00:00 | 45:00 | (null) | t3wn[30-33,35-36,38-44,46,48-54,56,58-63,70-73] | | standard* | 0/32/0/32 | 0/1680/0/1680 | 12:00:00 | 12:00:00 | (null) | t3wn[30-33,35-36,38-44,46,48-54,56,58-63,70-73] | | long | 0/24/0/24 | 0/848/0/848 | 7-00:00:00 | 1-00:00:00 | (null) | t3wn[30-33,35-36,38-44,46,48-54,56,58-59] | | qgpu | 0/2/0/2 | 0/80/0/80 | 1:00:00 | 30:00 | gpu:8(S:0-1) | t3gpu[01-02] | | gpu | 0/2/0/2 | 0/80/0/80 | 7-00:00:00 | 1-00:00:00 | gpu:8(S:0-1) | t3gpu[01-02] | *A/I/O/T* means active/idle/offline/total, so the fourth number is the number of total nodes or CPUs. You can get the above list also by running the following command on one of the UI nodes. <pre> sinfo -o "%.12P %.16F %.16C %.14l %.16L %.12G %N" </pre> To launch a batch job, you first prepare a batch script (usually a normal bash script that launches your executable), e.g. =my-script.sh= and you then use the =sbatch= command to submit it to Slurm. Here we submit to the =short= queue and we use the account =t3= for normal CPU jobs. <pre> sbatch -p short --account=t3 my-script.sh= </pre> The =sbatch= command supports a lot of additional configuration options (refer to its man page), e.g. you may want to specify the memory requirements of your job in MBs. <pre> sbatch -p wn --account=t3 --mem=3000 job.py= </pre> Instead of passing all these options on the command line you can also put them inside of the batch script (usually in the header part), starting lines with the =#SBATCH= comment part, e.g. <pre> # This is my batch script #SBATCH --mem=3000 #SBATCH --account=t3 #SBATCH --time=04:00:00 #SBATCH --partition=standard # now start our executable myexecutable </pre> ---++ Example Job submission scripts [[GPU Example][GPU Example]] [[CPU Example][CPU Example]] [[CPU Example for using multiple processors (threads) on a single physical computer][CPU Example for using multiple processors (threads) on a single physical computer]] Here are some [[SlurmMonitoringCommands][useful commands to check Slurm jobs and nodes status]] and [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/SlurmUtilisation][T3 Slurm Monitoring]] page. The detailed slurm configuration can be examined on any Slurm node by listing the configuration file =/etc/slurm/slurm.conf=. Slurm itself calculates *priorities of jobs* taking into account </br> - *FairShare*: past cluster usage by the user (part of a decay function) </br> - *Age of Job*: time the job has been waiting in the queue </br> - *Job Size*: size of resource request CPU, Memory </br> The default memory per job slot (memory-per-cpu) is slightly below 2GB/CPU, given by the oldes nodes.
This topic: CmsTier3
>
WebHome
>
SlurmUsage
Topic revision: r31 - 2021-06-01 - DerekFeichtinger
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback