Policies for the resource allocation on the PSI Tier-3

Policies for the resource allocation on the PSI Tier-3

These policies were agreed upon in the first and second Steering Board meetings.

We organize the users along Physics Groups
For the purpose of the cluster organization, every user must be mapped exactly to one physics group (even though the user may work for several).
The resources available to a physics group consist of the added up resources of its members. How these resources are used is up to the physics group's internal organization.
Each physics group has one Responsible User who
- takes care of managing the group's resources (e.g. deciding what data sets to delete).
- is the single point of contact for the cluster administrators for organizational issues.
- can propose a guest user (see below).
The resources are equipartitioned between users.

User Interface ( UIs ) policies

OS	UI Hostname	users group	Notes
SL6	t3ui01	PSI	132GB RAM, 72cores, 4TB `/scratch` ( type RAID1+0 )
SL6	t3ui02	ETHZ	132GB RAM, 72cores, 4TB `/scratch` ( type RAID1+0 )
SL6	t3ui03	UNIZ	132GB RAM, 72cores, 4TB `/scratch` ( type RAID1+0 )

Home policies

Every user has home directory /t3home/${USER} with default 10GB quota for development of software, documentation, etc.
It's configured with 1 daily, 1 weekly and 1 monthly snapshots placed beyond quota limit at /t3home/username/.snapshot

To check your quota allotment use the following command:

~% quota -s -f /t3home
Disk quotas for user username (uid user_id): 
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
t3nfs:/t3/home1    704M   10240M   11264M          927   4295m   4295m

For big size files there is /shome/${USER} volume with effective space ~150GB and 2 daily snpashots.
(In the end of February after all user migration /shome will be renamed to /work.)
Snapshots are a part of shome quota. In a case quota is completely filled only administrator is in charge to help. Requests must be sent to the admin mailing list.

Batch system policies

These policies were discussed and endorsed in the steering board meeting of 2011-11-03

Aims

The T3 CPUs and Storage resources must be shared in a fair way.
All users are treated equally, so resources are accorded per user, not per group.
We want to ensure that we keep an adequate part of the resources for short turnaround jobs. Longer jobs (like for bigger private MC production) should be possible, but during the main office hours the shorter jobs will have priority and longer jobs will be throttled by a quota.
Scheduling policies have the greatest impact when job turnaround times are small, so we would like to favor short queue jobs over long queue jobs.
- short queue jobs should be able to run on all slots of the cluster
- short queue job runtime should cover the majority of use cases
- long queue jobs should only be able to saturate part of the cluster
since we have tens of users, we would like to reduce the amount of slots a single user can fill, especially on the long queue

Resource quota limits for enforcing policies

Explicit scheduling policies:

Queue job runtime limits.
- short.q: 90min
- all.q: 10h ( this is the default queue used by a qsub command )
- long.q: 96h
Queue jobs amount limits. How many jobs can be running in each of the queues:
- short.q: can run on all the 1040 available job slots.
- all.q and long.q together: max 740 job slots.
- long.q: max 360 job slots.
User jobs amount limits. Defines the maximum number of jobs a user can have running in each queue:
- short.q: max 460 jobs.
- all.q: max 400 jobs.
- long.q: max 340 jobs.
- A user can only ever have 500 running jobs in total, independent of the queues.
Users with justified requests for large numbers of very long jobs can be accorded resources on special request (mail to steering board)
All These policies are relaxed at both night and on weekends, so that the cluster can be taken by a bigger number of jobs:
- The User jobs amount limits are turned off.
- Night time defined as weekdays from 19:00 - 4:00, weekend time defined as Sat 4:00 - Mon 4:00

Other resource limits affecting job submission

Job RAM limit, default 3GB:
- By default 3GB of RAM will be reserved on the assigned t3wn server and if the job will use more than 3GB then it will be killed; Read about h_vmem
- If you need >3GB use the qsub option -l h_vmem=nG, with n <= 6G ( 6 GByte ); the more RAM you'll request the less jobs will get running in a t3wn server, so check if you really need so much RAM ( in all the CMS worldwide grid centres is tolerated a max of 2GB of RAM ! )
- By running qstat -j JOBID you will see the h_vmem RAM value that was requested at submission time, either by default or by you.

How to check the current batch system policies

The command qquota report the batch system quota usage, per a single user or per all the users ; the batch system policies are published on each t3ui1* server in /gridware/sge_ce/tier3-policies/ ; for instance during the day they are :
More... Close

$ grep -A 100000 -B 10000 --color TRUE /gridware/sge_ce/tier3-policies/day
{
   name         max_jobs_per_sun_host
   description  Allow maximally 8 jobs per bl6270 host
   enabled      TRUE
   limit        queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts {@bl6270} to slots=8
}
{
   name         max_jobs_per_intel_host
   description  Allow maximally 16 jobs per intel host
   enabled      TRUE
   limit        queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts {@wnintel} to slots=16
}
{
   name         max_jobs_per_intel2_host
   description  Allow maximally 64 jobs per intel2 host
   enabled      TRUE
   limit        queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts {@wnintel2} to slots=64
}
{
   name         max_jobs_per_supermicro_host
   description  Allow maximally 32 jobs per supermicro host
   enabled      TRUE
   limit        queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts {@wnsupermicro} to slots=32
}
{
   name         max_jobs_per_t3vm03
   description  NONE
   enabled      FALSE
   limit        queues all.q,short.q hosts t3vm03.psi.ch to slots=2
}
{
   name         test-rqs-admin2
   description  limit maximal number of jobs of a user in the admin queue
   enabled      FALSE
   limit        users {*} queues all.q.admin to slots=40
}
{
   name         test-rqs-admin
   description  limit admin queue to 30 slots total
   enabled      FALSE
   limit        queues all.q.admin to slots=30
}
{
   name         max_allq_jobs
   description  limit all.q and long.q to a maximal number of common slots
   enabled      TRUE
   limit        queues all.q,long.q to slots=740
}
{
   name         max_longq_jobs
   description  limit long.q to a maximal number of slots
   enabled      TRUE
   limit        queues long.q to slots=360
}
{
   name         max_sherpagen_jobs
   description  limit sherpa.gen.q to a maximal number of slots
   enabled      TRUE
   limit        queues sherpa.gen.q to slots=50
}
{
   name         max_sherpaintlong_jobs
   description  limit sherpa.int.long.q to a maximal number of slots
   enabled      TRUE
   limit        queues sherpa.int.long.q to slots=32
}
{
   name         max_sherpaintvlong_jobs
   description  limit sherpa.int.vlong.q to a maximal number of slots
   enabled      TRUE
   limit        queues sherpa.int.vlong.q to slots=32
}
{
   name         max_user_jobs_per_queue
   description  Limit a user to a maximal number of concurrent jobs in each \
   queue
   enabled      TRUE
   limit        users {*} queues all.q to slots=400
   limit        users {*} queues short.q to slots=460
   limit        users {*} queues long.q to slots=340
   limit	users {*} queues sherpa.gen.q to slots=32
   limit	users {*} queues sherpa.int.long.q to slots=32
   limit	users {*} queues sherpa.int.vlong.q to slots=32
}
{
   name         max_jobs_per_user
   description  Limit the total number of concurrent jobs a user can run on \
   the cluster
   enabled      TRUE
   limit        users {*} queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q to slots=500
}

All the current quotas qquota -u \*
More... Close

resource quota rule limit                filter
--------------------------------------------------------------------------------
max_jobs_per_sun_host/1 slots=2/8            queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn17
max_jobs_per_sun_host/1 slots=1/8            queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn25
max_jobs_per_sun_host/1 slots=1/8            queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn15
max_jobs_per_intel_host/1 slots=11/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn39
max_jobs_per_intel_host/1 slots=13/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn37
max_jobs_per_intel_host/1 slots=12/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn30
max_jobs_per_intel_host/1 slots=13/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn38
max_jobs_per_intel_host/1 slots=12/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn40
max_jobs_per_intel_host/1 slots=13/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn34
max_jobs_per_intel_host/1 slots=12/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn36
max_jobs_per_intel_host/1 slots=12/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn33
max_jobs_per_intel_host/1 slots=12/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn35
max_jobs_per_intel_host/1 slots=12/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn32
max_jobs_per_intel_host/1 slots=12/16          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn31
max_jobs_per_intel2_host/1 slots=25/64          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn52
max_jobs_per_intel2_host/1 slots=25/64          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn51
max_jobs_per_intel2_host/1 slots=24/64          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn55
max_jobs_per_intel2_host/1 slots=25/64          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn54
max_jobs_per_intel2_host/1 slots=24/64          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn53
max_jobs_per_intel2_host/1 slots=24/64          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn56
max_jobs_per_intel2_host/1 slots=24/64          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn57
max_jobs_per_intel2_host/1 slots=24/64          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn58
max_jobs_per_intel2_host/1 slots=10/64          queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn59
max_jobs_per_supermicro_host/1 slots=1/32           queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn50
max_allq_jobs/1    slots=344/740        queues all.q,long.q
max_longq_jobs/1   slots=340/360        queues long.q
max_user_jobs_per_queue/1 slots=4/400          users ggiannin queues all.q
max_user_jobs_per_queue/3 slots=340/340        users wiederkehr_s queues long.q
max_jobs_per_user/1 slots=340/500        users wiederkehr_s queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q
max_jobs_per_user/1 slots=4/500          users ggiannin queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q

Availability of an interactive queue to debug the user jobs

A special debug.q queue allowing interactive sessions is available. Please consult the wiki page HowToDebugJobs . There was also a presentation in 2015

UIs and WNs `/tmp /scratch` usage

Sometimes we've found that shared working /tmp or the /scratch partitions of UIs and WNs are full because some user had filled them with big and later forgotten files/dirs or simply because a job went crazy. There was a clear users requirement to manage this space by themselves, so that there is no automatic cleaning. Please clean it timely and remember that /scratch is the least protected area and doesn't dedicated to keep for a long an important data.

If you discover an abuse, write or call your colleague and invite him/her to cleanup. Otherwise your peers work could be blocked.

Administrators can help when group members are still interested in outdated user files presence in scratch. In this case the owner has to be explicitly changed to a new responsible person.