KeyWords:
SysAdmin,
Maintenance
Scheduled Downtime on 2010-10-04
Our Phoenix Cluster needs some maintenance operations that require a downtime. We will take the next maintenance window (first Monday of the month) to do them.
It will start on Monday at 9:30 and last until Monday at 17:00.
All queues will be drained three days before. We will reserve an scheduled downtime until Tuesday at 17:00 for safety, but will open the queues and exit the downtime as soon as we check all the systems work properly.
Summary of interventions
Lustre Stonith replacement (JT)
We are going to remove stonith mechanism (used between lustre servers to kill it's companion to be able to take over) for MMP (multiple mount protection) to increase stability in Lustre.
Upgrade Torque server and clients (PO)
Torque actual version (2.3) doesn't scale anymore. We are going to replace it by version 2.5, that also supports High Availability, and we will also implement it.
Enable firewall on CreamCE and Thors (PF)
We will enable some firewall rules on those machines. We want to do it during a downtime to avoid touching production.
--
PabloFernandez - 2010-09-27