Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+!! Scheduled Maintenance on 2012-07-04 The next first working Wednesday of the month we will go into Scheduled Downtime. It will last from 9:00 to 18:00, but we will return to operation as soon as we finish. As usual, CMS and Atlas queues will be closed 24 hours before the maintenance, and LHCb queue will close 48 hours before the maintenance. ---++!! Summary of interventions We will perform the following operations on the cluster: %TOC% --- ---++ Upgrade kernel on SL6 nodes %ICON{done}% * *Description*: There is a security issue affecting RHEL6 kernels, we need to upgrade them * *Affected nodes*: kvm01, cvmfs * *Notes*: ---++ Bios/ILOM upgrade on gpfs nodes %ICON{done}% * *Description*: The PCIe bus on all X4270 nodes has a known bug that may cause problems with disks and infiniband * *Affected nodes*: oss[11-42], mds[1-2] * *Notes*: <verbatim>version load -source http://192.168.64.20/ILOM-3_0_16_15_r69954-Sun_Fire_X4170.pkg version</verbatim> ---++ Torque upgrade to 2.4.17 %ICON{done}% * *Description*: There are two bug fixes solved that affect us * *Affected nodes*: lrms[01-02], wn[01-46], cream[01,02], arc[01-02] * *Notes*: <verbatim>dsh -g WN -g CREAM_CE -g ARC_CE 'rpm -Uvh http://repo/torque-2.4.17-1.cri.x86_64.rpm http://repo/torque-client-2.4.17-1.cri.x86_64.rpm' dsh -w lrms[01-02] 'rpm -Uvh http://repo/torque-2.4.17-1.cri.x86_64.rpm http://repo/torque-client-2.4.17-1.cri.x86_64.rpm http://repo/torque-server-2.4.17-1.cri.x86_64.rpm http://repo/torque-devel-2.4.17-1.cri.x86_64.rpm' dsh -g WN -g CREAM_CE -g ARC_CE -g LRMS 'rpm -qa | grep ^torque | sort' | dshbak -c ssh lrms01 'grid-service stop' ssh lrms02 'grid-service restart' dsh -g WN -g CREAM_CE -g ARC_CE 'grid-service restart' ssh lrms01 'grid-service restart'</verbatim> ---++ Kernel upgrade on all Thors %ICON{done}% * *Description*: Thors are having spontaneous 'soft lockup' BUGs, that seems to be kernel-related. We need to upgrade to the latest, and hope for the best. * *Affected nodes*: se[30-39] * *Notes*: We have finally decided to upgrade all the dCache servers to the latest distro/kernel/security upgrades, not just the Thors
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r4 - 2012-07-04
-
PabloFernandez
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback