<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup,Main.CMSAdminReaderGroup --> ---+ Slurm Batch system usage This is introduction to T3 Slurm configuration - a modern job scheduler for Linux clusters. Please use User Interface Nodes t3ui01-03 mostly for development and small quick tests. For intensive computational work one should use Compute Nodes. There are two types of Compute Nodes in T3 Slurm - Worker Nodes for CPU usage and GPU machines. All new hardware is equipped with 256GB of RAM and 10GbE network: | *Compute Node* | *Processor Type* | *Computing Resources: Cores/GPUs* | |t3ui01-03 - login node | Intel Xeon E5-2697 (2.30GH) | 72 | | t3gpu0[1-2] | Intel Xeon E5-2630 v4 (2.20GHz) | 8 * !GeForce GTX 1080 Ti | | t3wn60-63 | Intel Xeon Gold 6148 (2.40GHz) | 80 | | t3wn51-59 | Intel Xeon E5-2698 (2.30GHz) | 64 | | t3wn41-43,45-48 | AMD Opteron 6272 (2.1GHz) | 32 | | t3wn30-36,38-39 | Intel Xeon E5-2670 (2.6 GHz) | 16 | Access to the Compute Nodes is controlled by Slurm. </br> Currently Maximum number of CPU jobs each user is allowed to run is *500* (it's about 40% of CPU resources). </br> There are four partitions (similar to SGE queues) implemented. Two for CPU and 2 for GPU usage: * *quick* for CPU short jobs; default time is 30 min, max - 1 hour * *wn* for CPU longer jobs * *qgpu* for short GPU jobs; default time is 30 min, max - 1 hour and 1 GPU/user * *gpu* for GPU resources; max 15 GPUs/user Here is few useful commands start to work with Slurm: <pre> sinfo # monitor nodes and partitions sbatch # submit a batch script squeue # view information about jobs in the scheduling queue sacct (-j X) # view detailed information about jobs (or specific job X) sacct --helpformat # see format options for sacct sacct --format="JobID,JobName%30,State,CPUTime,TimeLimit" -j X # view full JobName up to 30 characters scancel -j X # abort job X scancel -n X # deletes all jobs with job name X sprio -l # priority of your jobs sshare -a # share information about all users </pre> To submit job to the wn partition issue: =sbatch -p wn --account=t3 job.sh= One might create a shell script with a set of all directives starting with =#SBATCH= string like in the following examples. </br> [[GPU Example][GPU Example]] [[CPU Example][CPU Example]] [[CPU Example for using multiple processors (threads) on a single physical computer][CPU Example for using multiple processors (threads) on a single physical computer]] One can check Slurm configuration (information about Nodes and Partitions, etc.) from /etc/slurm/slurm.conf Slurm itself calculates *priorities of jobs* taking into account </br> - *Age of Job*: the job has been waiting in queue </br> - *FairShare*: past of the cluster usage by the user </br> - *Job Size*: resources request CPU, Memeory </br> So that it's useful to declare time resource in submission script (the less required the higher priority) with time option like ==--time=...== -- Main.NinaLoktionova - 2019-05-08
This topic: CmsTier3
>
WebHome
>
SlurmUsage
Topic revision: r22 - 2020-03-16 - NinaLoktionova
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback