Tags:
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Scheduled Maintenance on 2013-04-09 The 9th of April 2013 we will go into Scheduled Downtime. It will last from 8:00 to 18:00, but we will return to operation as soon as we finish. As usual, CMS and Atlas queues will be closed 24 hours before the maintenance, and LHCb queue will close 48 hours before the maintenance. ---++!! Summary of interventions We will perform the following operations on the cluster: %TOC% --- ---++ Upgrade of Compute Nodes to UMD-2 * *Description*: All compute nodes need to be upgraded to UMD-2 * *Affected nodes*: wn[01-79] * *Notes*: This would be a normal upgrade (SL5), OS will come at a later stage. Note: Dalco will come and exchange all the power supplies of the compute nodes on that date. Nodes migrated to UMD 2, cfengine updated. The files for the UMD-1 repos are empty, this was done in cfengine to remove the possibilty of accidentily picking up old repos. WN74-78 were still having issues with installing suspected due to fake RAID setup. Will continue investiagtion once Dalco engineer replaces power supplies. ---++ Restart of se13 * *Description*: There is a Nagios check that complains about the blocksize of the raids * *Affected nodes*: se13 * *Notes*: All parameters are actually OK, only a reboot is enough to make the check stop complaining. Retarted the dcs3700_tunning service ---++ Set MTU=4000 on all ethernet nodes * *Description*: We need to set the MTU to 4000 on all ethernet nodes * *Affected nodes*: All VMs (guests and hosts), both 4036E bridges, and the router. * *Notes*: This affects only public, production IPs. All private interfaces should stay on 1500, for simplicity (we may need to adjust the nagios check). * Virtual guests * Virtual hosts * Bridges and router %ICON{done}% * Nagios checks Note: The limit of 4096 (probably 4092) is the Infiniband limit, that has MTU of 65k only on Connected Mode, not available in Ethernet. ---++ %ICON{done}% Upgrade to latest CVMFS/squid. * *Description*: Requested from WLCG, before end of April * *Affected nodes*: WNs. * *Notes*: On clients ---++ %ICON{done}% Upgrade Java on all dCache nodes * *Description*: Java version installed in new pools (and other nodes) is too old. We need to homogenize the version we use. * *Affected nodes*: se[01-14], storage[01-02] * *Notes*: Search for the newest and apply it everywhere. * storage01-02 %ICON{done}% ---++ %ICON{done}% Upgrade dCache on storage[01,02] * *Description*: There is a newer version, and since we're having problems with SRM, it may be a good idea to upgrade it there. * *Affected nodes*: storage[01-02] * *Notes*: This was decided at a late stage ---++ %ICON{done}% Restart IB Switch * *Description*: swib8 need to be restarted to pick up the right name * *Affected nodes*: all * *Notes*: ---++ Check IB cables * *Description*: There are some links that create trouble * *Affected nodes*: check * *Notes*: There are a couple of nodes with problems on the cables. Also, change IB card on oss42 (check) ---++ Firmware upgrade on DS3500 controllers * *Description*: Need to upgrade controllers to version 07.83 * *Affected nodes*: se[01-08] * *Notes*: Together with disk FW, so need to stop all IO (WARNING!) (problem with disk FW, ticket opened with IBM. IO tests are good, seems like it only affects the FW upgrade itself) * Storage-1 %ICON{done}% * Storage-2 %ICON{done}% * Storage-3 * Storage-4 %ICON{done}%
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r15
|
r13
<
r12
<
r11
<
r10
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r11 - 2013-04-09
-
PabloFernandez
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback