Tags:
create new tag
view all tags

Slurm Batch system usage

This is introduction to T3 Slurm configuration - a modern job scheduler for Linux clusters.

Please use User Interface Nodes t3ui01-03 mostly for development and small quick tests.

For intensive computational work one should use Compute Nodes. There are two types of Compute Nodes in T3 Slurm - Worker Nodes for CPU usage and GPU machines. All new hardware is equipped with 256GB of RAM and 10GbE network:

Compute Node Processor Type Computing Resources: Cores/GPUs
t3ui01-03 - login node Intel Xeon E5-2697 (2.30GH) 72
t3gpu0[1-2] Intel Xeon E5-2630 v4 (2.20GHz) 8 * GeForce GTX 1080 Ti
t3wn60-63 Intel Xeon Gold 6148 (2.40GHz) 80
t3wn51-59 Intel Xeon E5-2698 (2.30GHz) 64
t3wn41-43,45-48 AMD Opteron 6272 (2.1GHz) 32
t3wn30-36,38-39 Intel Xeon E5-2670 (2.6 GHz) 16

Access to the Compute Nodes is controlled by Slurm.
Currently Maximum number of CPU jobs each user is allowed to run is 500 (it's about 40% of CPU resources).
There are four partitions implemented. Two for CPU and 2 for GPU usage:

  • quick for CPU short jobs; default time is 30 min, max - 1 hour
  • wn for CPU longer jobs
  • qgpu for short GPU jobs; default time is 30 min, max - 1 hour and 1 GPU/user
  • gpu for GPU resources; max 15 GPUs/user

To submit job to the wn partition issue: sbatch -p wn --account=t3 job.sh

One might create a shell script with a set of all directives starting with #SBATCH string like in the following examples.

GPU Example

CPU Example

CPU Example for using multiple processors (threads) on a single physical computer

Here is few useful commands to Monitor Slurm.

One can check Slurm configuration (information about Nodes and Partitions, etc.) from /etc/slurm/slurm.conf

Slurm itself calculates priorities of jobs taking into account
- Age of Job: the job has been waiting in queue
- FairShare: past of the cluster usage by the user
- Job Size: resources request CPU, Memeory

So that it's useful to declare time resource in submission script (the less required the higher priority) with time option like --time=...

Edit | Attach | Watch | Print version | History: r26 < r25 < r24 < r23 < r22 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r26 - 2020-03-31 - NinaLoktionova
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback