create new tag
view all tags

KeyWords: SysAdmin, Maintenance

Scheduled Downtime on 2010-10-04

Our Phoenix Cluster needs some maintenance operations that require a downtime. We will take the next maintenance window (first Monday of the month) to do them.

It will start on Monday at 9:30 and last until Monday at 17:00.

All queues will be drained three days before. We will reserve an scheduled downtime until Tuesday at 17:00 for safety, but will open the queues and exit the downtime as soon as we check all the systems work properly.

Summary of interventions

Lustre Stonith replacement (JT)

We are going to remove stonith mechanism (used between lustre servers to kill it's companion to be able to take over) for MMP (multiple mount protection) to increase stability in Lustre.

Upgrade Torque server and clients (PO)

Torque actual version (2.3) doesn't scale anymore. We are going to replace it by version 2.5, that also supports High Availability, and we will also implement it.

Enable firewall on CreamCE and Thors (PF)

We will enable some firewall rules on those machines. We want to do it during a downtime to avoid touching production.

-- PabloFernandez - 2010-09-27

When 2010-10-04
Downtime required yes
Done no
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2011-01-13 - PabloFernandez
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback