OS | UI Hostname | users group | Notes |
---|---|---|---|
SL6 | t3ui01 | PSI | 132GB RAM, 72cores, 4TB /scratch ( type RAID1+0 ) |
SL6 | t3ui02 | ETHZ | 132GB RAM, 72cores, 4TB /scratch ( type RAID1+0 ) |
SL6 | t3ui03 | UNIZ | 132GB RAM, 72cores, 4TB /scratch ( type RAID1+0 ) |
/shome
policies /shome/$USER
filesystem ( it's not a simple dir ) featuring : /shome/$USER/.zfs/snapshot
; to recover a file, or a whole dir, simply use the cp
command ; no interaction with the T3 admins will be needed !
/shome/$USER/.zfs/snapshot
,
/shome/$USER
, deleting it and then trying to download the same file again it will immediately fail reporting out of space
;
if a T3 user runs out of space then only the T3 admins will be able to recover space by serially deleting his/her oldest snapshots. /shome/$USER
usage by this URL :
$ lynx --dump --width=800 http://t3mon.psi.ch/PSIT3-custom/space.report | egrep "NAME|$USER"
NAME QUOTA AVAIL RESERV USED USEDDS USEDSNAP SSCOUNT RATIO CREATION
data01/shome/martinelli_f 800G 796G 10G 4.22G 4.22G 3.53M 46 1.25x Mon Dec 7 18:49 2015
short.q
: 90min
all.q
: 10h ( this is the default queue used by a qsub
command )
long.q
: 96h
short.q
: can run on all the 1040 available job slots.
all.q
and long.q
together: max 740 job slots.
long.q
: max 360 job slots.
short.q
: max 460 jobs.
all.q
: max 400 jobs.
long.q
: max 340 jobs.
t3wn
server and if the job will use more than 3GB then it will be killed; Read about h_vmem
qsub
option -l h_vmem=nG
, with n
<= 6G ( 6 GByte ); the more RAM you'll request the less jobs will get running in a t3wn
server, so check if you really need so much RAM ( in all the CMS worldwide grid centres is tolerated a max of 2GB of RAM ! )
qstat -j JOBID
you will see the h_vmem
RAM value that was requested at submission time, either by default or by you.
qquota
report the batch system quota usage, per a single user or per all the users ; the batch system policies are published on each t3ui1*
server in /gridware/sge_ce/tier3-policies/
; for instance during the day they are :
More... Close
$ grep -A 100000 -B 10000 --color TRUE /gridware/sge_ce/tier3-policies/day { name max_jobs_per_sun_host description Allow maximally 8 jobs per bl6270 host enabled TRUE limit queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts {@bl6270} to slots=8 } { name max_jobs_per_intel_host description Allow maximally 16 jobs per intel host enabled TRUE limit queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts {@wnintel} to slots=16 } { name max_jobs_per_intel2_host description Allow maximally 64 jobs per intel2 host enabled TRUE limit queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts {@wnintel2} to slots=64 } { name max_jobs_per_supermicro_host description Allow maximally 32 jobs per supermicro host enabled TRUE limit queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts {@wnsupermicro} to slots=32 } { name max_jobs_per_t3vm03 description NONE enabled FALSE limit queues all.q,short.q hosts t3vm03.psi.ch to slots=2 } { name test-rqs-admin2 description limit maximal number of jobs of a user in the admin queue enabled FALSE limit users {*} queues all.q.admin to slots=40 } { name test-rqs-admin description limit admin queue to 30 slots total enabled FALSE limit queues all.q.admin to slots=30 } { name max_allq_jobs description limit all.q and long.q to a maximal number of common slots enabled TRUE limit queues all.q,long.q to slots=740 } { name max_longq_jobs description limit long.q to a maximal number of slots enabled TRUE limit queues long.q to slots=360 } { name max_sherpagen_jobs description limit sherpa.gen.q to a maximal number of slots enabled TRUE limit queues sherpa.gen.q to slots=50 } { name max_sherpaintlong_jobs description limit sherpa.int.long.q to a maximal number of slots enabled TRUE limit queues sherpa.int.long.q to slots=32 } { name max_sherpaintvlong_jobs description limit sherpa.int.vlong.q to a maximal number of slots enabled TRUE limit queues sherpa.int.vlong.q to slots=32 } { name max_user_jobs_per_queue description Limit a user to a maximal number of concurrent jobs in each \ queue enabled TRUE limit users {*} queues all.q to slots=400 limit users {*} queues short.q to slots=460 limit users {*} queues long.q to slots=340 limit users {*} queues sherpa.gen.q to slots=32 limit users {*} queues sherpa.int.long.q to slots=32 limit users {*} queues sherpa.int.vlong.q to slots=32 } { name max_jobs_per_user description Limit the total number of concurrent jobs a user can run on \ the cluster enabled TRUE limit users {*} queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q to slots=500 }All the current quotas
qquota -u \*
More... Close resource quota rule limit filter -------------------------------------------------------------------------------- max_jobs_per_sun_host/1 slots=2/8 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn17 max_jobs_per_sun_host/1 slots=1/8 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn25 max_jobs_per_sun_host/1 slots=1/8 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn15 max_jobs_per_intel_host/1 slots=11/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn39 max_jobs_per_intel_host/1 slots=13/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn37 max_jobs_per_intel_host/1 slots=12/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn30 max_jobs_per_intel_host/1 slots=13/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn38 max_jobs_per_intel_host/1 slots=12/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn40 max_jobs_per_intel_host/1 slots=13/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn34 max_jobs_per_intel_host/1 slots=12/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn36 max_jobs_per_intel_host/1 slots=12/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn33 max_jobs_per_intel_host/1 slots=12/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn35 max_jobs_per_intel_host/1 slots=12/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn32 max_jobs_per_intel_host/1 slots=12/16 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn31 max_jobs_per_intel2_host/1 slots=25/64 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn52 max_jobs_per_intel2_host/1 slots=25/64 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn51 max_jobs_per_intel2_host/1 slots=24/64 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn55 max_jobs_per_intel2_host/1 slots=25/64 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn54 max_jobs_per_intel2_host/1 slots=24/64 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn53 max_jobs_per_intel2_host/1 slots=24/64 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn56 max_jobs_per_intel2_host/1 slots=24/64 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn57 max_jobs_per_intel2_host/1 slots=24/64 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn58 max_jobs_per_intel2_host/1 slots=10/64 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn59 max_jobs_per_supermicro_host/1 slots=1/32 queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q hosts t3wn50 max_allq_jobs/1 slots=344/740 queues all.q,long.q max_longq_jobs/1 slots=340/360 queues long.q max_user_jobs_per_queue/1 slots=4/400 users ggiannin queues all.q max_user_jobs_per_queue/3 slots=340/340 users wiederkehr_s queues long.q max_jobs_per_user/1 slots=340/500 users wiederkehr_s queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q max_jobs_per_user/1 slots=4/500 users ggiannin queues all.q,short.q,long.q,sherpa.gen.q,sherpa.int.long.q.sherpa.int.vlong.q
/tmp /scratch
user quota /tmp
or the /scratch
partitions of UIs and WNs full because some user had filled them with big and later forgotten files/dirs or simply because a job went crazy; so we've decided to introduce the disk quotas to detect and stop at least these macro errors; the disk quotas, so far, are not designed to manage the case where many users are using their individually allowed amount of space but all together are filling all the space. It's up to you and your group to make room regularly on the shared filesystems.
For people not familiar with the Linux quota terms we've reported here the official definitions: soft limit | hard limit | grace | |
---|---|---|---|
/tmp | 40% | 50% | 7 days |
/scratch | 80% | 90% | 7 days |
/tmp
and/or /scratch
$ dd if=/dev/zero of=/tmp/zero.$USER sda5: warning, user block quota exceeded. sda5: write failed, user block limit reached. dd: writing to `/tmp/zero': Disk quota exceeded
Dear T3 User your disk usage has exceeded the agreed limits on this server, have a look to this page to check the actual T3 usage policies: https://wiki.chipp.ch/twiki/bin/view/CmsTier3/Tier3Policies Please delete any unnecessary files on following filesystems: The /tmp filesystem (/dev/sda5) Block limits File limits Filesystem used soft hard grace used soft hard grace /dev/sda5 +- 1486100 891660 1486100 6days 1 0 0 The T3 Administrators.
/tmp
or /scratch
quota [auser@t3ui07 ~]$ quota -s -f /tmp Disk quotas for user auser (uid 515): none [auser@t3ui07 ~]$ quota -s -f /scratch Disk quotas for user auser (uid 515): Filesystem blocks quota limit grace files quota limit grace /dev/sda8 39772M 93662M 115G 23463 0 0
/tmp
or /scratch
quota [auser@t3ui06 ~]$ sudo /usr/sbin/repquota -s /scratch *** Report for user quotas on device /dev/sdb1 Block grace time: 7days; Inode grace time: 7days Block limits File limits User used soft hard grace used soft hard grace ---------------------------------------------------------------------- root -- 188M 0 0 4 0 0 User1 -- 4 106G 120G 1 0 0 User2 -- 6385M 106G 120G 122 0 0 User3 -- 11149M 106G 120G 20 0 0 [auser@t3ui06 ~]$ sudo /usr/sbin/repquota -s /tmp *** Report for user quotas on device /dev/sda6 Block grace time: 7days; Inode grace time: 7days Block limits File limits User used soft hard grace used soft hard grace ---------------------------------------------------------------------- root -- 151M 0 0 5 0 0 xfs -- 0 0 0 1 0 0 nagios -- 4 0 0 2 0 0 User1 -- 4 3876M 4844M 1 0 0 User2 -- 1112 3876M 4844M 28 0 0 User3 -- 72 3876M 4844M 17 0 0
|
|