<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup,Main.CMSAdminReaderGroup --> ---+ Slurm Batch system usage This is introduction to test configuration of Slurm - a modern job scheduler for Linux clusters - at T3. Currently t3ui07 is a single login node for Slurm. As any User Interface Node it should be used mostly for development and small quick tests. For intensive computational work one should use Compute Nodes. There are two types of Compute Nodes - Worker Nodes for CPU usage and GPU machines. All new hardware is equipped with 256GB of RAM and 10GbE network: | *Compute Node* | *Processor Type* | *Computing Resources: Cores/GPUs* | |t3ui07 - login node| Intel Xeon Gold 6148 (2.40GHz) | 80 Cores | | t3gpu0[1-2] | Intel Xeon E5-2630 v4 (2.20GHz) | 8 * !GeForce GTX 1080 Ti | | t3wn60 | Intel Xeon Gold 6148 (2.40GHz) | 80 Cores | | t3wn48 | AMD Opteron 6272 (2.1GHz) | 32 Cores | Access to the Compute Nodes is controlled by Slurm. </br> Corresponding to computing resources there are two partitions (similar to SGE queues) implemented: *wn* and *gpu*. Here is few useful commands start to work with Slurm: <pre> sinfo # view information about Slurm nodes and partitions sbatch # submit a batch script squeue # view information about jobs in the scheduling queue scancel # to abort the job </pre> To submit job to the wn partition issue: =sbatch -p wn job.sh= One might create a shell script with a set of all directives starting with =#SBATCH= string like in the following examples. </br> *GPU Example*: <pre> #!/bin/bash # #SBATCH --job-name=test_job #SBATCH --account=gpu_gres # to access gpu resources #SBATCH --partition=gpu #SBATCH --nodes=1 # request to run job on single node ##SBATCH --ntasks=10 # request 10 CPU's (t3gpu01/02: balance between CPU and GPU : 5CPU/1GPU) #SBATCH --gres=gpu:2 # request for two GPU's on machine, this is total amount of GPUs for job ##SBATCH --mem=4000M # memory (per node) #SBATCH --time=0-00:30 # time in format DD-HH:MM # Slurm reserves two GPU's (according to requirement above), those ones that are recorded in shell variable CUDA_VISIBLE_DEVICES echo CUDA_VISIBLE_DEVICES : $CUDA_VISIBLE_DEVICES # python program script.py should use CUDA_VISIBLE_DEVICES variable (*NOT* hardcoded GPU's numbers) python script.py </pre> *CPU Example*: <pre> #!/bin/bash # #SBATCH -p wn #SBATCH --time 01:00:00 #SBATCH -w t3wn60 #SBATCH -e cn-test.err #SBATCH -o cn-test.out # replace default slurm-SLURM_JOB_ID.out echo HOME: $HOME echo USER: $USER echo SLURM_JOB_ID: $SLURM_JOB_ID echo HOSTNAME: $HOSTNAME # each worker node has local /scratch space to be used during job run mkdir -p /scratch/$USER/${SLURM_JOB_ID} sleep 10 # here comes a computation rmdir /scratch/$USER/${SLURM_JOB_ID} date </pre> To start Slurm usage please ask T3 Administrators [cms-tier3@lists.psi.ch]‎ to add your user_id to Slurm accounts. -- Main.NinaLoktionova - 2019-05-08
This topic: CmsTier3
>
WebHome
>
SlurmUsage
Topic revision: r4 - 2019-05-21 - NinaLoktionova
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback