KeyWords: SysAdmin, Maintenance

Scheduled Downtime on 2010-09-06

Our Phoenix Cluster needs some maintenance operations that require a downtime. We will take the next maintenance window (first Monday of the month) to do them.

It will start on Monday 6 at 9:30 and last until Monday 6 at 17:00.

All queues will be drained three days before. We will reserve an scheduled downtime until Tuesday 7th at 17:00 for safety, but will open the queues and exit the downtime as soon as we check all the systems work properly.

Summary of interventions

Upgrade dCache version to 1.9.5-22 (FG, PF)

dCache is still moving forward with the Golden Release that will be around for long time, and we want to be up-to-date. It solves a number of bugs, some may have already affected us, some may not.

We will then upgrade the version from 1.9.5-19 to 1.9.5-22. That requires to shut down the service, but we will try to bring it up as soon as possible so that, even though we are in downtime, transfers from/to other sites still work.

Upgrade of gLite to revision 18 (PF, FG)

A new release of gLite is out, we want to upgrade all grid machines to the latest possible version.

Upgrade to Lustre 1.8.4 (JT)

A new version of Lustre is out. It solves a known bug in the MDS that has affected us.

Upgrade the driver on the OSSs to access the JBODs in Lustre (JT)

Also, there is a new release of the driver that controls the disk controllers in the OSS's. We think that it may solve some stability issues with the disks.

Update some host certificates (PF)

The following certificates are about to expire: storage01-02, se01-28, *vobox, mon, ce01, arc01, and we will update them all.

Enable Hyper-Threading on all WNs (ALL)

And that requires also some work in Torque and Moab

Create poolaccounts for /cms/integration (PO)

We need to create 10 high priority slots (not reservations) for this new cms group. They say there could be less than 10 users, so we probably create 15 or 20 poolaccounts for them.


-- PabloFernandez - 2010-08-30

When Monday 6, September 2010
Downtime required yes
Done no

This topic: LCGTier2 > WebHome > ServiceInformation > AdminArea > Maintenance2010x09x06
Topic revision: r4 - 2010-09-07 - JasonTemple
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback