KeyWords: SysAdmin, Maintenance

Scheduled Downtime on 2010-11-08

Our Phoenix Cluster needs some maintenance operations that require a downtime. We will take the next maintenance window (first Monday of the month) to do them. In this case, the first working Monday is the 8th.

It will start on Monday at 9:30 and last until Monday at 17:00.

The queues will be drained 20 hours before. We will reserve an scheduled downtime until Tuesday at 17:00 for safety, but will open the queues and exit the downtime as soon as we check all the systems work properly.

Summary of interventions

Lustre Filesystem check (JT)

We have to do a fsck on Lustre to ensure consistency.

Upgrade glibc libraries (PF)

On the rest of the nodes not applied yed, to fix a security hole. Then we will restart all machines.

Upgrade torque client on Arc01/02 (PO)

The one which is installed doesn't want to work with a failed over lrms.

Add a couple of roles to /ops mapping (PO)

At the end they told us it should look like this:

"/ops/Role=NULL/Capability=NULL" .ops
"/ops/Role=lcgadmin" opssgm
"/ops/Role=lcgadmin/Capability=NULL" opssgm
"/ops/Role=pilot" .ops
"/ops/Role=pilot/Capability=NULL" .ops
"/ops/*" .ops   (matches also /ops/NGI/Germany) 

-- PabloFernandez - 2010-10-18

MaintenanceForm
When 2010-11-08
Downtime required yes
Done no
Edit | Attach | Watch | Print version | History: r9 | r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2010-11-04 - PabloFernandez
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback