<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people * #Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Upgrade the Phoenix Cluster to LCG 2.7 The cluster upgrade is scheduled for March 22nd 14.00 - March 24th 18.00. If all goes well, we will not need the full length of this downtime. The intervention will need to also upgrade the kernel, see KernelUpdate. ---++ Documentation Details for how all the upgrade will have to be performed can be found here on http://lcg.web.cern.ch/LCG/Sites/releases.html The following documents are relevant: * [[http://grid-deployment.web.cern.ch/grid-deployment/releaseNoteLCG-2.7.0.txt][Release Notes]] (plain text file) * LCG2 Manual *Upgrade* Instructions ([[http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Upgrade.pdf][pdf]], [[http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Upgrade][html]]) * LCG2 Manual *Installation* Instructions ([[http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Install.pdf][pdf]], [[http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Install][html]]) * LCG2 *Testing* of a site ([[http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Site-Testing.pdf][pdf]], [[http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Site-Testing/][html]]) * LCG2 User Guide ([[https://edms.cern.ch/file/454439//LCG-2-UserGuide.pdf][pdf]], [[https://edms.cern.ch/file/454439//LCG-2-UserGuide.html][html]]) gives detailed information on how to use the middleware and has some insights also into site configuration * [[https://uimon.cern.ch/twiki/bin/view/LCG/ClassicSeToDpm][Upgrading a Classical SE to a DPM]] - this is exactly what we need to do with our SE Other interesting documentation * [[http://grid-deployment.web.cern.ch/grid-deployment/documentation/Maui-Cookbook.pdf][Maui Cookbook]] * First-time site configuration description ([[http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Site-Setup.pdf][pdf]],[[http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Site-Setup][html]]) ---++ Upgrade Plan The upgrade has to start with the KernelUpdate on all nodes. Then the tasks are, in sequence * Define the nodes to be used for VO Boxes. New nodes may need to be allocated. * Define nodes for the Grid Services * Perform upgrade according to the instructions on all nodes * Perform upgrade of specific services on Grid nodes * Perform DPM installation according to documentation * Re-Check site configuration * Execute Test suite ---++ Upgrade Log The intervention was planned to be held from Wednesday 22/03/2006, from 14:00 until Friday 24/03/2006 18:00 at the latest. It lasted actually until Monday 27/03/2007 18:00. Exception: the DPM has only been installed by March 31st. ---+++ Short event log * Attempted to upgrade the kernel to 2.6. This failed due to unavailability of hardware drivers. Almost all problems are understood now but there is considerable more time to be invested before such an upgrade can be attempted again. * Attempted to upgrade the kernel to the latest available version on 2.4. This succeeded after some unforeseen difficulties with some of the hardware drivers. However, a lot of time has been used for this upgrade and the analysis of the issues related to 2.6 - the kernel upgrade intervention has ended around Friday lunchtime. * Upgrade of LCG services to 2.7 succeeded * Upgrade of classic SE to DPM suceeded. ---+++ Steps taken during the scheduled downtime * Entered scheduled downtime in the GOC using the downtime broadcast tool at https://cic.in2p3.fr/index.php?id=rc&subid=rc_publish&js_status=2 , it was visible at the [[http://goc.grid-support.ac.uk/gridsite/operations/downtimes.php][corresponding list]] for a while. * All nodes have been backed up. * Starting kernel update tests for the 2.6 kernel. Problems encountered with the Fibre Channel driver. Contacted DALCO. * Kernel update now only to last version of 2.4 kernel available from SCL: ==2.4.21.40.EL.cernsmp== * Answer from DALCO received, 2.4 updated kernel version can be installed * Upgrade performed for LCG and reconfigured based on YAIM 2_7_0.3 ---+++ Status as of Friday evening March 24th All the involved nodes have been therefore reconfigured based on YAIM 2_7_0.3 =site-info.def= global configuration file are installed to the =install-lcg= machine under =/export/ks/LCG/config=. This directory has been unexported from the entire subnet and the file has been protected for evident security reasons. The following nodes are configured : * CE_torque => ce01-lcg * SE_classic => se01-lcg * UI+MON => ui-lcg * WN_torque => wn[01..15]-lcg On the CE_torque, all the queues have been restarted and reopened, and the OpenPBS/Torque server, MAUI scheduler and the moms are running. * The SE_classic is still providing the /storage directoty hierarchy for the whole on-line disk space and the RFIO service is still running. * The GRIDICE service is not running yet. * The UI+MON is running also the RGMA service. * RB, PX and DBII services are still taken from CERN : * RB from lxn1177.cern.ch * DBII from lcg-bdii.cern.ch * PX from myproxy.cern.ch * Up to now, no LFC, nor DPM are configure yet on the PHOENIX cluster. * On the GOCDB, the monitoring has been re-enabled and could be checked on https://goc.grid-support.ac.uk/gridsite/gocdb2/index.php?siteSelect=12 This is available for CE, SE and UI(MON). -- Main.PeterKunszt - 3 Apr 2006
This topic: LCGTier2
>
WebHome
>
ToolsBoard
>
ObsoletePages
>
LcgUpgrade
Topic revision: r6 - 2011-01-21 - PabloFernandez
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback