Tags:
create new tag
view all tags

Site Downtime Procedure

This page contains details of what actions need to be taken in order to place the site into downtime

Change management

Make a note on http://neggio.cscs.ch/forum/ about the change.

Announcement

Prior to the site going down an official announcement must be made at least 5 days in advance.

CreamCE

To check the current state of a Cream you need a valid certificate, then execute the following

glite-ce-service-info cream01.lcg.cscs.ch

Interface Version  = [2.1]
Service Version    = [1.16.2 - EMI version: 3.6.0-1.el6]
Description        = [CREAM 2]
Started at         = [Fri Nov  8 17:53:18 2013]
Submission enabled = [NO]
Status             = [RUNNING]

To disable submission enter the following

glite-ce-disable-submission cream01.lcg.cscs.ch

The Creams make use of the following check in order to determine if they should publish they are draining or production. This file is managed by cfengine

/var/lib/bdii/gip/plugin/glite-info-dynamic-ce

Arc

To disable submission to the 'allownew' in ARC the arc.conf needs to be changed. This is managed by cfengine so make the following edit to each arc.

vim /srv/cfengine/files/arc01/etc/arc.conf

  #allownew=yes
  allownew=no

Slurm

Despite disabling submissions jobs may still find their way into the cluster as such we can set the partitions to a draining state within slurm.

To change a partition to drain do the following for each VO.

scontrol update partitionname=lcgadmin state=drain

Monitoring and Logging

If you are using the "at" command to schedule an action such as changing the partition state please make use of the mail functionality (-m flag) and write to a log to preserve historical data. Moreover, please post on the Change Management Tool running on neggio in order to maintain an official logging and a reference for other sysadmins.

For example setting a Slurm partition to drain.

at -m 7 AM + 5 days 'scontrol update partitionname=lcgadmin state=drain && echo "Set slurm partition lcgadmin to drain" | logger -t AT' 

TODO:

  • Nagios checks for partition state - DONE GB 20/11/2013
  • Nagios checks Cream submission state - Need to confirm LDAP output
  • Nagios checks ARC submission state - DONE GB 20/11/2013

-- GeorgeBrown - 2013-11-20

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r5 - 2013-11-20 - GeorgeBrown
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback