Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+!! Scheduled Maintenance on 2011-06-08 Next Wednesday we will go into Scheduled Downtime. It will last from 9:00 to 18:00, but we will return to operation as soon as we finish. As usual, CMS and Atlas queues will be closed 24 hours before the maintenance, and LHCb queue will close 48 hours before the maintenance. %TOC% ---++!! Summary of interventions We will perform the following operations on the cluster: * Torque upgrade to 2.4.13 * Upgrade WN * NFS DRBD fix * Torque/Thumper firmware update * Xen14/15 cfengine integration * Enable glexec capability on CreamCE * Installation of EMI1 CREAM-CE in cream02 * Apply Argus policies script *from-groupmap-to-policy.sh* --- ---++ %ICON{done}% Torque upgrade to 2.4.13 * Description: Torque has been working with no HA for almost two months, because of instabilities in our current version (2.4.11). Torque support team has been working on this, and the only way to go forward with the ticket is to upgrade to 2.4.13. This may not solve the issue, but the only way to know is to try. * Affected nodes: All computing nodes: Cream, Arc, Lrms, WNs ---++ %ICON{done}% Upgrade WN * Description: There is a set of WNs [141-195] not yet upgraded to SL5.5. We will do that now. * Affected nodes: wn141-wn195 <verbatim>dsh -w wn[141-195] 'rpm -U http://ftp.scientificlinux.org/linux/scientific/55/x86_64/SL/yum-conf-55-1.SL.noarch.rpm' dsh -w wn[141-195] 'rpm -e srptools-debuginfo-0.0.4-1.ofed1.4.1.1.1.2 srptools-0.0.4-6.el5' dsh -w wn[141-195] 'yum clean all' dsh -w wn[141-195] 'yum update -y' dsh -w wn[141-195] 'rpm -e perl-XML-LibXML-1.70-2.el5.rf.x86_64' dsh -w wn[141-195] 'rpm_clone -r -f /opt/cscs/etc/pgklist/pkglist_WN_sun --enablerepo=cscs,glite*,dag,atlas* -y' dsh -w wn[141-195] 'rpm_clone -d -f /opt/cscs/etc/pgklist/pkglist_WN_sun --ignore-exclude' | dshbak -c dsh -w wn[141-195] 'rpm_clone -r -f /opt/cscs/etc/pgklist/pkglist_WN_sun --ignore-exclude -y' dsh -g WN 'rpm_clone -d -f /opt/cscs/etc/pgklist/pkglist_WN_sun --ignore-exclude' | dshbak -c</verbatim> * Side note: During the maintenance, a new gLite release was out for WNs. We have also installed it. ---++ %ICON{done}% NFS DRBD fix * Description: There was a mistake in the DRBD configuration in NFS that could prevent the Secondary become primary on Failover situation * Affected nodes: nfs01/nfs02, and all computing nodes: Cream, Arc, Lrms, WNs ---++ %ICON{done}% Torque/Thumper firmware update * Description: Ther disk controller on Thors and Thumpers suffers from sporadic freeze-ups. This affects a node every month, and hopefully a Firmware Upgrade would solve it (or not) * Affected nodes: All dCache nodes. dCache service will be interrupted. * Side note: During the maintenance, we also upgraded dcache to the latest bugfix version. ---++ %ICON{done}% Xen14/15 cfengine integration * Description: Xen14 and xen15 are not controlled by cfengine. This is an ancient problem, we'll try to solve it now. * Affected nodes: xen14, xen15, and all VMs inside: Argus, Lrms, Cream, Arc. ---++ %ICON{done}% Installation of EMI1 CREAM-CE in cream02 * Description: Since we are early adopters of EMI CREAM-CE, we need to upgrade one of the cream servers to use the new EMI1 release. * Affected nodes: cream02 * Process: 1 Shutdown =cream02= and make a backup of it <verbatim>dd if=/dev/vg_root/cream02_root bs=1M | gzip | ssh miguelgi@ui64 "dd of=./bck.cream02.before.EMI1.img.gzip bs=1M"</verbatim> 1 *cfengine* Files to modify in CREAM directory *%RED%make sure permissions of copied files and dirs are correct%ENDCOLOR%*: * /etc/yum.repos.d/glite-CREAM.repo --> move from CREAM to ppcream01 * /etc/yum.repos.d/glite-TORQUE_utils.repo -> move from CREAM to ppcream01 * /usr/share/tomcat5/webapps/ce-cream/WEB-INF/jobwrapper.tpl --> move from CREAM to ppcream01 * /etc/sudoers --> move from CREAM to ppcream01 * /etc/profile.d/CSCS.sh --> (make diff with the one in regular machine) and move from CREAM to ppcream01 1 *cfengine* Files to copy from cfengine of ppcream02 to CREAM directory *%RED%make sure permissions of copied files and dirs are correct%ENDCOLOR%*: * /etc/sudoers * /etc/profile.d/CSCS.sh * /var/lib/tomcat5/webapps/ce-cream/WEB-INF/jobwrapper.tpl 1 *cfengine* Files/directories to copy from PPCREAM_CE to CREAM_CE directory *%RED%make sure permissions of copied files and dirs are correct%ENDCOLOR%* * /etc/grid-security/gridmapdir * /etc/pki/rpm-gpg * /lustre/scratch * /opt/edg/var/info/ * /var/lib/tomcat5/webapps * /var/log/cream * /var/log/tomcat5 * /var/spool/pbs/server_priv/accounting * /var/spool/pbs/server_priv/server_name MODIFY ACCORDING TO PRODUCTION CREAM!!! 1 *cfengine* Files to be *MERGED* from PPS directory to ANY directory * /srv/cfengine/files/PPS/opt/cscs/siteinfo/nodes/ppcream02.lcg.cscs.ch * /srv/cfengine/files/ANY/opt/cscs/siteinfo/nodes/cream02.lcg.cscs.ch 1 *cfengine, inputs* Files to be *MERGED* * cf.CREAM_CE * cf.PPCREAM_CE 1 Install the machine following instructions on https://wiki.chipp.ch/twiki/bin/view/LCGTier2/XenSampleImageReplication and https://wiki.chipp.ch/twiki/bin/view/LCGTier2/ServiceConfiguration 1 Once the machine is installed, install the CREAMCE service: https://wiki.chipp.ch/twiki/bin/view/LCGTier2/ServiceCreamCE ---++ %ICON{done}% Enable glexec capability on CreamCE *%RED%THIS STEP DEPENDS ON PREVIOUS STEPS%ENDCOLOR%*: [[https://wiki.chipp.ch/twiki/bin/view/LCGTier2/SiteMaintenance20110608#Installation_of_EMI1_CREAM_CE_in][Installation of EMI1 CREAM-CE]] * Description: To be able to use glExec from outside, this capability has to be announced through CreamCE. * Also, a few RPMs must be installed and YAIM must be run in the remaining WNs (wn141-wn195) * Affected nodes: cream01, cream02, WNs (wn141-wn195) * Process: 1 For installing glExec in the WNs follow the instructions in [[https://wiki.chipp.ch/twiki/bin/view/LCGTier2/ServiceArgus#Installing_GLExec_in_the_Worker][the TWiki]] 1 For enabling the glexec capability in the CREAMs modify =/srv/cfengine/files/ANY/opt/cscs/siteinfo/site-info.def= according to PPS: <verbatim>CE_CAPABILITY="CPUScalingReferenceSI00=2500 Share=atlas:40 Share=cms:40 Share=lhcb:20 glexec"</verbatim> Some extra information: https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables#site_info_def 1 Run =cfengine= and =yaim= in the affected CREAMs. ---++ %ICON{done}% Apply Argus policies script *from-groupmap-to-policy.sh* * Description: There is a script that automatically matches policies in the groupmap file to agus policies. So, it needs to be deployed. * Affected nodes: argus01, argus02 * Process: 1 Apply policies script ( =from-groupmap-to-policy.sh=) to =argus01= making diff with local policies 1 Apply the same script to =argus02=
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r10
<
r9
<
r8
<
r7
<
r6
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r10 - 2011-06-08
-
PabloFernandez
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback