Tags:
create new tag
view all tags

Slurm Batch system usage

This is introduction to T3 Slurm configuration - a modern job scheduler for Linux clusters.

Please use User Interface Nodes t3ui01-03 mostly for development and small quick tests.

For intensive computational work one should use Compute Nodes. There are two types of Compute Nodes in T3 Slurm - Worker Nodes for CPU usage and GPU machines. All new hardware is equipped with 256GB of RAM and 10GbE network:

Compute Node Processor TypeSorted ascending Computing Resources: Cores/GPUs Memory, GB
t3wn41-43,45-50 AMD Opteron 6272 (2.1GHz) 32 96
t3gpu0[1-2] Intel Xeon E5-2630 v4 (2.20GHz) 8 * GeForce GTX 1080 Ti 256
t3wn30-36,38-39 Intel Xeon E5-2670 (2.6 GHz) 16 48
t3ui01-03 - login node Intel Xeon E5-2697 (2.30GH) 72 128
t3wn51-59 Intel Xeon E5-2698 (2.30GHz) 64 128
t3wn60-63 Intel Xeon Gold 6148 (2.40GHz) 80 256

Access to the Compute Nodes is controlled by Slurm.
Currently Maximum number of CPU jobs each user is allowed to run is 500 (it's about 40% of CPU resources).
There are four partitions implemented. Two for CPU and 2 for GPU usage:

  • quick for CPU short jobs; default time is 30 min, max - 1 hour
  • wn for CPU longer jobs
  • qgpu for short GPU jobs; default time is 30 min, max - 1 hour and 1 GPU/user
  • gpu for GPU resources; max 15 GPUs/user

To submit job to the wn partition issue: sbatch -p wn --account=t3 --mem=3000 job.py

One might create a shell script with a set of all directives starting with #SBATCH string like in the following examples.

GPU Example

CPU Example

CPU Example for using multiple processors (threads) on a single physical computer

Here is few useful commands to check Slurm jobs and nodes status and T3 Slurm Monitoring page.

One can check Slurm configuration (information about Nodes and Partitions, etc.) from /etc/slurm/slurm.conf

Slurm itself calculates priorities of jobs taking into account
- Age of Job: the job has been waiting in queue
- FairShare: past of the cluster usage by the user
- Job Size: resources request CPU, Memory

Please take into account that currently (since May 2020) default memory for a jobslot is 2GB/CPU; and during execution job is not able to use more memory.
One should explicitly declare bigger amount of memory if needed in batch script: --mem=4GB.
It's useful also to declare time resource (the less required the higher priority) with time option like --time=...

Edit | Attach | Watch | Print version | History: r29 < r28 < r27 < r26 < r25 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r29 - 2020-05-27 - NinaLoktionova
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback