Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> KeyWords: SysAdmin, [[Torque]], [[Maui]] ---+ A script to schedule downtimes There is a tedious part in scheduled downtimes: figuring out and actually typing the correct ==at== incantation to drain the job queues so that we start the downtime morning without running jobs in the cluster, yet allow users to run during the weekend before the maintenance. After some experimentation (see [[https://twiki.cscs.ch/twiki/bin/view/LCGTier2/OldPhoenixBlog#Draining_CE_queues_for_the_sched this blog post]] and [[https://webrt.cscs.ch/Ticket/Display.html?id=5637 CSCS ticket #5637]]), we came to the conclusion that the best thing to do is: * allow jobs in the =ops= queue (that is, SAM tests) even during the downtime: if the downtime has been properly scheduled in the [[https://goc.gridops.org GOCDB]] then they will not count against our reliability. * stop all other queues so that no new jobs are started; a queue allowing _X_ hours of CPU time should not start any new jobs (at least) _X+1_ hours _before_ the downtime begins; * drain all queues (except, again, =ops=): a queue will not accept any new jobs (I<draining> state) at the point in time when new jobs risk having less than 30 minutes of proxy validity left at the downtime end. For the purpose of computing this, it is assumed that each job's proxy lasts as much as the job requested CPU time, with a minimum of 12 hours. I've written a PERL script =/opt/cscs/sbin/downtime= to compute queue closing and draining times and submit appropriate ==at== jobs for controlling the [[Torque]] queues. (The script is being deployed by CfEngine and registered in its SubVersion repository.) Example usage: * Schedule a downtime at 9:00 on 2009-02-02 <verbatim> # downtime --verbose 2009-02-02 Downtime will start at: 09:00 on 2009-02-02 Downtime will end at: 17:00 on 2009-02-02 Draining 'egee8h' at 20:30 2009-02-01... job 23 at 2009-02-01 20:30 Draining 'egee24h' at 08:30 2009-02-01... job 24 at 2009-02-01 08:30 Draining 'others' at 08:30 2009-01-31... job 25 at 2009-01-31 08:30 Draining 'egee48h' at 08:30 2009-01-31... job 26 at 2009-01-31 08:30 Closing 'egee8h' at 22:29 2009-02-01... job 27 at 2009-02-01 22:29 Closing 'egee1h' at 06:59 2009-02-02... job 28 at 2009-02-02 06:59 Closing 'egee24h' at 02:29 2009-02-01... job 29 at 2009-02-01 02:29 Closing 'others' at 20:29 2009-01-30... job 30 at 2009-01-30 20:29 Closing 'egee48h' at 20:29 2009-01-30... job 31 at 2009-01-30 20:29 </verbatim> * Schedule a downtime at 10:00 on 2009-03-10, lasting 4:00 <verbatim> # downtime 2009-02-02 10:00 --duration 4:00 </verbatim> Note that the ==downtime== command needs to be run by a user who has the permissions to operate on the [[Torque]] queues. ---++ Readers' comments %COMMENT{type="below"}%
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
EXT
downtime
r1
manage
6.5 K
2009-01-26 - 14:50
RiccardoMurri
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r1 - 2009-01-26
-
RiccardoMurri
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback