<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Site Downtime Procedure This page contains details of what actions need to be taken in order to place the site into downtime %TOC% ---++ Change management Make a note on http://neggio.cscs.ch/forum/ about the change. ---++ Announcement Prior to the site going down an official announcement must be made at least 5 days in advance. ---++ CreamCE To check the current state of a Cream you need a valid certificate, then execute the following <verbatim> glite-ce-service-info cream01.lcg.cscs.ch Interface Version = [2.1] Service Version = [1.16.2 - EMI version: 3.6.0-1.el6] Description = [CREAM 2] Started at = [Fri Nov 8 17:53:18 2013] Submission enabled = [NO] Status = [RUNNING] </verbatim> To disable submission enter the following <verbatim> glite-ce-disable-submission cream01.lcg.cscs.ch </verbatim> The Creams make use of the following check in order to determine if they should publish they are draining or production. This file is managed by cfengine <verbatim> /var/lib/bdii/gip/plugin/glite-info-dynamic-ce </verbatim> ---++ Arc To disable submission to the 'allownew' in ARC the arc.conf needs to be changed. This is managed by cfengine so make the following edit to each arc. <verbatim> vim /srv/cfengine/files/arc01/etc/arc.conf #allownew=yes allownew=no </verbatim> ---++ Slurm Despite disabling submissions jobs may still find their way into the cluster as such we can set the partitions to a draining state within slurm. To change a partition to drain do the following for each VO. <verbatim> scontrol update partitionname=lcgadmin state=drain </verbatim> ---++ Monitoring and Logging If you are using the "at" command to schedule an action such as changing the partition state please make use of the mail functionality (-m flag) and write to a log to preserve historical data. Moreover, please post on the Change Management Tool running on =neggio= in order to maintain an official logging and a reference for other sysadmins. For example setting a Slurm partition to drain. <verbatim> at -m 7 AM + 5 days 'scontrol update partitionname=lcgadmin state=drain && echo "Set slurm partition lcgadmin to drain" | logger -t AT' </verbatim> TODO: * Nagios checks for partition state - DONE GB 20/11/2013 * Nagios checks Cream submission state - Need to confirm LDAP output * Nagios checks ARC submission state - DONE GB 20/11/2013 -- Main.GeorgeBrown - 2013-11-20
This topic: LCGTier2
>
WebHome
>
ToolsBoard
>
FormsAndTemplates
>
SiteDowntimeProcedure
Topic revision: r5 - 2013-11-20 - GeorgeBrown
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback