Tags:
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+!! Scheduled Maintenance on 2013-07-03 The next first working Wednesday of the month we will go into Scheduled Downtime. It will last from 9:00 to 18:00, but we will return to operation as soon as we finish. As usual, CMS and Atlas queues will be closed 24 hours before the maintenance, and LHCb queue will close 48 hours before the maintenance. __ REMOVE: REMEMBER TO ADD DOWNTIME IN GOCGB_ Queues will be closed according to schedule: <verbatim>Jun 14 12:07 [root@lrms02:~]# echo "qdisable atlas" | at -m 9am 2.07.13 job 100 at 2013-07-02 09:00 Jun 14 12:07 [root@lrms02:~]# echo "qdisable atlashimem" | at -m 9am 2.07.13 job 101 at 2013-07-02 09:00 Jun 14 12:07 [root@lrms02:~]# echo "qdisable cms" | at -m 9am 2.07.13 job 102 at 2013-07-02 09:00 Jun 14 12:07 [root@lrms02:~]# echo "qdisable other" | at -m 9am 2.07.13 job 103 at 2013-07-02 09:00 Jun 14 12:07 [root@lrms02:~]# echo "qdisable lhcb" | at -m 9am 1.07.13 job 104 at 2013-07-01 09:00 Jun 14 12:07 [root@lrms02:~]# echo "qdisable lcgadmin" | at -m 8:30am 3.07.13 job 105 at 2013-07-03 08:30 Jun 14 12:07 [root@lrms02:~]# atq 100 2013-07-02 09:00 a root 102 2013-07-02 09:00 a root 105 2013-07-03 08:30 a root 103 2013-07-02 09:00 a root 101 2013-07-02 09:00 a root 104 2013-07-01 09:00 a root</verbatim> ---++!! Summary of interventions We will perform the following operations on the cluster: %TOC% --- ---++ %ICON{done}% Restrict squid * *Description*:Restrict squid so only RAL servers can be accessed from the squid proxy * *Affected nodes*: =cvmfs1=, =cvmfs=, =wn[01-78]= * *Notes*: Add the following to =squid.conf= <verbatim> acl ral dst cernvmfs.gridpp.rl.ac.uk acl ral dst cvmfs.racf.bnl.gov acl cvmfs dst cvmfs-stratum-one.cern.ch acl cvmfs dst cernvmfs.gridpp.rl.ac.uk acl cvmfs dst cvmfs.racf.bnl.gov acl cvmfs dst cvmfs02.grid.sinica.edu.tw acl cvmfs dst cvmfs.fnal.gov acl cvmfs dst cvmfs-atlas-nightlies.cern.ch </verbatim> And update http access rule for localnet <verbatim> http_access allow localnet ral http_access allow localnet cvmfs </verbatim> ---++ Update worker nodes to SL6 * *Description*:Worker nodes will be updated to SL6 * *Affected nodes*: All worker nodes * *Notes*:With this update we will be able to use the OFED stack bundled in SL6 and remove Mellanx OFED from the install process. Refinement of the install process is also to be improved using internal repos, reboots during provisioning are to be kept to a minimum. Also install mcelog to monitor for memory errorrs Restart the BDII services to ensure we are publishing the correct information. ---++ Update cvmfs * *Description*: In SL6, cvmfs needs to be updated to 2.1 * *Affected nodes*: All worker nodes * *Notes*: We also have to mount cvms in RW mode. Consult web-rt ticket #13573. ---++ Restart pbs and dcache services * *Description*: After the DNS change we need to restart services querying old systems. * *Affected nodes*: =se[01-14], storage0[1,2]= and =lrms0[1-2]= * *Notes*: Check ticket #13546 ---++ %ICON{done}% Decommission KVM01 * *Description*:Remaining VMs are to be moved form this host to KVM01 can be decommissioned * *Affected nodes*: Pub, UI64, ppcvmfs * *Notes*: 1 %ICON{done}% =pub= is still at 5.4, reinstall with 6.4 1 %ICON{done}% =ui= has been installed on KVM03, this will replace =ui64= 1 %ICON{done}% =ppcvmfs= to be moved to pre production KVM host. ---++ %ICON{done}% Decommission old voboxes * *Description*: Old voboxes need to be decommissioned. * *Affected nodes*: =cmsvobox= and =atlasvobox= * *Notes*: =atlasvobox= can be shutdown but NOT =cmsvobox= (it has been moved to a kvm VM until CMS is ready). =atlasvobox= VM disks have been moved to =/kvm02/= ---++ %ICON{done}% Migrate lrms02 to kvm * *Description*: right now =lrms02= is still a Xen VM that needs to be migrated to KVM. * *Affected nodes*: =lrms02= * *Notes*: Check the process followed in the previous maintenance. ---++ Update kernels of SL6 machines * *Description*:CVE-2013-2094 allows privilege escalation from standard user to root * *Affected nodes*: <br /> %ICON{done}% =ui= <br /> %ICON{done}% =logstash= (NO) Storage01, Storage02 <br /> (NO) Cream01, Cream02, Cream03 <br /> SBDII01, SBDII02, SBDII03 <br /> APEL <br /> (NO) KVM02, KVM03 * *Notes*: Machines are not user facing ---++ Update CREAM-CE to last release * *Description*: Update all CREAM-CEs to last UMD-2 release. * *Affected nodes*: =cream01=, %ICON{done}% =cream02=, %ICON{done}% =cream03= * *Notes*: Need to run also YAIM ---++%ICON{done}% Update ntp servers * *Description*: time1.cscs.ch and time2.cscs.ch are the only ntp servers to be used as detailed here https://wiki.cscs.ch/mediawiki/index.php/Maintenance-reports:July_03_2013#Important * *Affected nodes*: All machines * *Notes*: Currently time1.cscs.ch, time2.cscs.ch and insone.admin.cscs.ch/ 148.187.12.21 are used. ---++ Expand dCache monitoring * *Description*:Add monitoring tools to gain better awareness over what is happening within dcache * *Affected nodes*: storage01.lcg.cscs.ch * *Notes*: Enable the dcache statistics and install srmwatch ---++ Details for enabling statistics %ICON{done}% http://www.dcache.org/manuals/Book-2.2/config/cf-statistics-basic-fhs.shtml %ICON{done}% http://www.dcache.org/manuals/Book-2.2/config/cf-statistics-webPage-fhs.shtml SRM watch http://www.dcache.org/manuals/Book-1.9.5/config/cf-srm-monitor.shtml example running at FNAL http://cmsdcam3.fnal.gov:8081/srmwatch/ ---++ %ICON{done}% Fix errors found in dCache * *Description*:There is an incorrect path in the LinkGroupAuthorization file and dcache servers require fetch-crl * *Affected nodes*: storage01.lcg.cscs.ch, storage02.lcg.cscs.ch and all se machines * *Notes*: Whilst troubleshooting dCache issues some errors have been found. The LinkGroupAuthorization.conf is in /etc/dcache not /opt/d-cache/config/ <verbatim> Jun 27 14:31 [root@nfs02:DCACHE22]# grep opt dcache.* | grep -v port dcache.conf:# Refer to /usr/share/dcache/defaults/dcache.properties for further options dcache.conf.pools.sepools3_22:# Source: /opt/d-cache//config/dCacheSetup dcache.conf.pools.sepools3_22:SpaceManagerLinkGroupAuthorizationFileName=/opt/d-cache/etc/LinkGroupAuthorization.conf dcache.conf.pools.sepools4_22:# Source: /opt/d-cache//config/dCacheSetup dcache.conf.pools.sepools4_22:SpaceManagerLinkGroupAuthorizationFileName=/opt/d-cache/etc/LinkGroupAuthorization.conf </verbatim> Machines need fetch-crl installed and have the cron job enabled as there is currently no vomsdir under /etc/grid-security/
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r20
|
r18
<
r17
<
r16
<
r15
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r16 - 2013-07-03
-
GianniRicciardi
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback