Tags:
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup,Main.CMSAdminReaderGroup --> ---+ Slurm Batch system usage This is introduction to T3 Slurm configuration - a modern job scheduler for Linux clusters. Please use User Interface Nodes t3ui01-03 mostly for development and small quick tests. For intensive computational work one should use Compute Nodes. There are two types of Compute Nodes in T3 Slurm - Worker Nodes for CPU usage and GPU machines. All new hardware is equipped with 256GB of RAM and 10GbE network: | *Compute Node* | *Processor Type* | *Computing Resources: Cores/GPUs* |*Memory, GB*| |t3ui01-03 - login node | Intel Xeon E5-2697 (2.30GH) | 72 | 128 | | t3gpu0[1-2] | Intel Xeon E5-2630 v4 (2.20GHz) | 8 * !GeForce GTX 1080 Ti | 256 | | t3wn60-63 | Intel Xeon Gold 6148 (2.40GHz) | 80 | 256 | | t3wn51-59 | Intel Xeon E5-2698 (2.30GHz) | 64 | 128 | | t3wn41-43,45-48 | AMD Opteron 6272 (2.1GHz) | 32 | 96 | | t3wn30-36,38-39 | Intel Xeon E5-2670 (2.6 GHz) | 16 | 48 | Access to the Compute Nodes is controlled by Slurm. </br> Currently Maximum number of CPU jobs each user is allowed to run is *500* (it's about 40% of CPU resources). </br> There are four partitions implemented. Two for CPU and 2 for GPU usage: * *quick* for CPU short jobs; default time is 30 min, max - 1 hour * *wn* for CPU longer jobs * *qgpu* for short GPU jobs; default time is 30 min, max - 1 hour and 1 GPU/user * *gpu* for GPU resources; max 15 GPUs/user To submit job to the wn partition issue: =sbatch -p wn --account=t3 --mem=3000 job.py= One might create a shell script with a set of all directives starting with =#SBATCH= string like in the following examples. </br> [[GPU Example][GPU Example]] [[CPU Example][CPU Example]] [[CPU Example for using multiple processors (threads) on a single physical computer][CPU Example for using multiple processors (threads) on a single physical computer]] Here is few [[SlurmMonitoringCommands][useful commands to check Slurm jobs and nodes status]] and [[https://wiki.chipp.ch/twiki/bin/view/CmsTier3/SlurmUtilisation][T3 Slurm Monitoring]] page. One can check Slurm configuration (information about Nodes and Partitions, etc.) from /etc/slurm/slurm.conf Slurm itself calculates *priorities of jobs* taking into account </br> - *Age of Job*: the job has been waiting in queue </br> - *FairShare*: past of the cluster usage by the user </br> - *Job Size*: resources request CPU, Memeory </br> So that it's useful to declare time resource in submission script (the less required the higher priority) with time option like ==--time=...==
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r35
|
r30
<
r29
<
r28
<
r27
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r28 - 2020-04-27
-
NinaLoktionova
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback