Upgrade the Phoenix Cluster to LCG 2.7

The cluster upgrade is scheduled for March 22nd 14.00 - March 24th 18.00. If all goes well, we will not need the full length of this downtime.

The intervention will need to also upgrade the kernel, see KernelUpdate.

Documentation

Details for how all the upgrade will have to be performed can be found here on http://lcg.web.cern.ch/LCG/Sites/releases.html

The following documents are relevant:

  • Release Notes (plain text file)
  • LCG2 Manual Upgrade Instructions (pdf, html)
  • LCG2 Manual Installation Instructions (pdf, html)
  • LCG2 Testing of a site (pdf, html)
  • LCG2 User Guide (pdf, html) gives detailed information on how to use the middleware and has some insights also into site configuration
  • Upgrading a Classical SE to a DPM - this is exactly what we need to do with our SE

Other interesting documentation

Upgrade Plan

The upgrade has to start with the KernelUpdate on all nodes. Then the tasks are, in sequence

  • Define the nodes to be used for VO Boxes. New nodes may need to be allocated.
  • Define nodes for the Grid Services
  • Perform upgrade according to the instructions on all nodes
  • Perform upgrade of specific services on Grid nodes
  • Perform DPM installation according to documentation
  • Re-Check site configuration
  • Execute Test suite

Upgrade Log

The intervention was planned to be held from Wednesday 22/03/2006, from 14:00 until Friday 24/03/2006 18:00 at the latest. It lasted actually until Monday 27/03/2007 18:00. Exception: the DPM has only been installed by March 31st.

Short event log

  • Attempted to upgrade the kernel to 2.6. This failed due to unavailability of hardware drivers. Almost all problems are understood now but there is considerable more time to be invested before such an upgrade can be attempted again.
  • Attempted to upgrade the kernel to the latest available version on 2.4. This succeeded after some unforeseen difficulties with some of the hardware drivers. However, a lot of time has been used for this upgrade and the analysis of the issues related to 2.6 - the kernel upgrade intervention has ended around Friday lunchtime.
  • Upgrade of LCG services to 2.7 succeeded
  • Upgrade of classic SE to DPM suceeded.

Steps taken during the scheduled downtime

  • Entered scheduled downtime in the GOC using the downtime broadcast tool at https://cic.in2p3.fr/index.php?id=rc&subid=rc_publish&js_status=2 , it was visible at the corresponding list for a while.
  • All nodes have been backed up.
  • Starting kernel update tests for the 2.6 kernel. Problems encountered with the Fibre Channel driver. Contacted DALCO.
  • Kernel update now only to last version of 2.4 kernel available from SCL: 2.4.21.40.EL.cernsmp
  • Answer from DALCO received, 2.4 updated kernel version can be installed
  • Upgrade performed for LCG and reconfigured based on YAIM 2_7_0.3

Status as of Friday evening March 24th

All the involved nodes have been therefore reconfigured based on YAIM 2_7_0.3 site-info.def global configuration file are installed to the install-lcg machine under /export/ks/LCG/config. This directory has been unexported from the entire subnet and the file has been protected for evident security reasons.

The following nodes are configured :

  • CE_torque => ce01-lcg
  • SE_classic => se01-lcg
  • UI+MON => ui-lcg
  • WN_torque => wn[01..15]-lcg

On the CE_torque, all the queues have been restarted and reopened, and the OpenPBS/Torque server, MAUI scheduler and the moms are running.

  • The SE_classic is still providing the /storage directoty hierarchy for the whole on-line disk space and the RFIO service is still running.
  • The GRIDICE service is not running yet.
  • The UI+MON is running also the RGMA service.
  • RB, PX and DBII services are still taken from CERN :
    • RB from lxn1177.cern.ch
    • DBII from lcg-bdii.cern.ch
    • PX from myproxy.cern.ch
  • Up to now, no LFC, nor DPM are configure yet on the PHOENIX cluster.
  • On the GOCDB, the monitoring has been re-enabled and could be checked on https://goc.grid-support.ac.uk/gridsite/gocdb2/index.php?siteSelect=12 This is available for CE, SE and UI(MON).

-- PeterKunszt - 3 Apr 2006

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r5 - 2006-05-05 - PeterKunszt
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback