Upgrade the Phoenix Cluster to LCG 2.7
The cluster upgrade is scheduled for March 22nd 14.00 - March 24th 18.00. If all goes well, we will not need
the full length of this downtime.
The intervention will need to also upgrade the kernel, see
KernelUpdate.
Documentation
Details for how all the upgrade will
have to be performed can be found here on
http://lcg.web.cern.ch/LCG/Sites/releases.html
The following documents are relevant:
- Release Notes (plain text file)
- LCG2 Manual Upgrade Instructions (pdf, html)
- LCG2 Manual Installation Instructions (pdf, html)
- LCG2 Testing of a site (pdf, html)
- LCG2 User Guide (pdf, html) gives detailed information on how to use the middleware and has some insights also into site configuration
- Upgrading a Classical SE to a DPM - this is exactly what we need to do with our SE
Other interesting documentation
Upgrade Plan
The upgrade has to start with the
KernelUpdate on all nodes. Then the tasks are, in sequence
- Define the nodes to be used for VO Boxes. New nodes may need to be allocated.
- Define nodes for the Grid Services
- Perform upgrade according to the instructions on all nodes
- Perform upgrade of specific services on Grid nodes
- Perform DPM installation according to documentation
- Re-Check site configuration
- Execute Test suite
Upgrade Log
The intervention was planned to be held from Wednesday 22/03/2006, from 14:00
until Friday 24/03/2006 18:00 at the latest. It lasted actually until Monday 27/03/2007
18:00. Exception: the DPM has only been installed by March 31st.
Short event log
- Attempted to upgrade the kernel to 2.6. This failed due to unavailability of hardware drivers. Almost all problems are understood now but there is considerable more time to be invested before such an upgrade can be attempted again.
- Attempted to upgrade the kernel to the latest available version on 2.4. This succeeded after some unforeseen difficulties with some of the hardware drivers. However, a lot of time has been used for this upgrade and the analysis of the issues related to 2.6 - the kernel upgrade intervention has ended around Friday lunchtime.
- Upgrade of LCG services to 2.7 succeeded
- Upgrade of classic SE to DPM suceeded.
Steps taken during the scheduled downtime
- Entered scheduled downtime in the GOC using the downtime broadcast tool at https://cic.in2p3.fr/index.php?id=rc&subid=rc_publish&js_status=2 , it was visible at the corresponding list for a while.
- All nodes have been backed up.
- Starting kernel update tests for the 2.6 kernel. Problems encountered with the Fibre Channel driver. Contacted DALCO.
- Kernel update now only to last version of 2.4 kernel available from SCL:
2.4.21.40.EL.cernsmp
- Answer from DALCO received, 2.4 updated kernel version can be installed
- Upgrade performed for LCG and reconfigured based on YAIM 2_7_0.3
Status as of Friday evening March 24th
All the involved nodes have been therefore reconfigured based on
YAIM 2_7_0.3
site-info.def
global configuration file are installed to the
install-lcg
machine under
/export/ks/LCG/config
.
This directory has been unexported from the entire subnet and the file has been protected for evident security reasons.
The following nodes are configured :
- CE_torque => ce01-lcg
- SE_classic => se01-lcg
- UI+MON => ui-lcg
- WN_torque => wn[01..15]-lcg
On the CE_torque, all the queues have been restarted and reopened, and
the
OpenPBS/Torque server, MAUI scheduler and the moms are running.
- The SE_classic is still providing the /storage directoty hierarchy for the whole on-line disk space and the RFIO service is still running.
- The GRIDICE service is not running yet.
- The UI+MON is running also the RGMA service.
- RB, PX and DBII services are still taken from CERN :
- RB from lxn1177.cern.ch
- DBII from lcg-bdii.cern.ch
- PX from myproxy.cern.ch
- Up to now, no LFC, nor DPM are configure yet on the PHOENIX cluster.
- On the GOCDB, the monitoring has been re-enabled and could be checked on https://goc.grid-support.ac.uk/gridsite/gocdb2/index.php?siteSelect=12 This is available for CE, SE and UI(MON).
--
PeterKunszt - 3 Apr 2006