Tags:
create new tag
view all tags

Slurm Batch system usage

This is introduction to T3 Slurm configuration - a modern job scheduler for Linux clusters.

Please use User Interface Nodes t3ui01-03 mostly for development and small quick tests.

For intensive computational work one should use Compute Nodes. There are two types of Compute Nodes in T3 Slurm - Worker Nodes for CPU usage and GPU machines. All new hardware is equipped with 256GB of RAM and 10GbE network:

Compute Node Processor Type Computing Resources: Cores/GPUs
t3ui01-03 - login node Intel Xeon E5-2697 (2.30GH) 72
t3ui04,07 - test login node AMD Opteron 6272 (2.1GHz) 32
t3gpu0[1-2] Intel Xeon E5-2630 v4 (2.20GHz) 8 * GeForce GTX 1080 Ti
t3wn60-63 Intel Xeon Gold 6148 (2.40GHz) 80
t3wn51-59 Intel Xeon E5-2698 (2.30GHz) 64
t3wn40-48 AMD Opteron 6272 (2.1GHz) 32
t3wn30-36,38-39 Intel Xeon E5-2670 (2.6 GHz) 16

Access to the Compute Nodes is controlled by Slurm.
Currently there are four partitions (similar to SGE queues) implemented. Two for CPU and 2 for GPU usage:

  • quick for CPU short jobs; default time is 30 min, max - 1 hour
  • wn for CPU longer jobs
  • qgpu for short GPU jobs; default time is 30 min, max - 1 hour
  • gpu for GPU resources

Here is few useful commands start to work with Slurm:

sinfo           # view information about Slurm nodes and partitions
sbatch          # submit a batch script 
squeue          # view information about jobs in the scheduling queue
sacct (-j X)    # view detailed information about jobs (or specific job X)
sacct --helpformat # see format options for sacct
sacct --format="JobID,JobName%30,State,CPUTime,TimeLimit" -j X # view full JobName up to 30 characters
scancel -j X    # abort job X
scancel -n X    # deletes all jobs with job name X
sprio -l        #  priority of your jobs
sshare -a       # share information about all users

To submit job to the wn partition issue: sbatch -p wn --account=cn-test job.sh

One might create a shell script with a set of all directives starting with #SBATCH string like in the following examples.

GPU Example

CPU Example

CPU Example for using multiple processors (threads) on a single physical computer

One can check Slurm configuration (information about Nodes and Partitions, etc.) from /etc/slurm/slurm.conf

Currently Maximum number of jobs each user is allowed to run is 400 (it's about 60% of CPU resources).

Slurm itself calculates priorities of jobs taking into account
- Age of Job: the job has been waiting in queue
- FairShare: past of the cluster usage by the user
- Job Size: resources request CPU, Memeory

So that it's useful to declare time resource in submission script (the less required the higher priority) like --time=...

-- NinaLoktionova - 2019-05-08

Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r19 - 2020-01-15 - NinaLoktionova
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback