Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> *CMS Tier-3 Upgrade Planning Page* <!-- # Use the attached form to define title and summary of this news item. The details you can fill in directly on this wiki page. --> ---+!! %FORMFIELD{"Title"}% %TOC% ---++ Summary %FORMFIELD{"Summary"}% ---++ Details ---+++ before downtime: preparation work 1 Dowtime Announcement * to users through mailing list and news section (done already in December) * to [[https://gocdb4.esc.rl.ac.uk/portal/index.php?Page_Type=View_Object&object_id=312&grid_id=0][GOCDB]] (we only were able to do it on Jan 5th) 1 Put SGE queues in draining mode according to their runtime limits ---+++ Downtime start on 2012-01-06, 15h 1 Stop the Nagios process on t3nagios 1 Barring access to users and killing all user sessions * set =/etc/security/access.conf= for all login machines to only allow admin access * reboot all UIs 1 ... 1 *Shortly before 15:30h* switch to the new virtual monitoring node t3mon01 * 1 DNS alias entry will be changed by Mauro at 15:30h 1 <strike>Shut down old virtualization environment</strike> WILL LEAVE THEM ON * t3vmmaster01: hosts t3vm03 (test WN) and obsolete t3vmbdii (off anyhow) * t3wn08: * t3vobox (active CMS Phedex service) * t3se02, t3fs12 (dcache testing env) * t3jstart (solaris jumpstart. off) * t3vm04 (obsolete solaris testing machine, off) 1 <strike>Shut down NFS servers t3fs06 and t3fs05</strike> NOT NECESSARY %TWISTY{showlink="Show" hidelink="Hide" showimgleft="%ICONURLPATH{toggleopen-small}%" hideimgleft="%ICONURLPATH{toggleclose-small}%"}%<verbatim> t3fs06# showmount -a | sort 192.33.123.200:/shome 192.33.123.209:/shome loghost:/shome t3ce.psi.ch:/shome t3ce01.psi.ch:/shome t3ce02.psi.ch:/shome t3cmsvobox02.psi.ch:/shome t3dcachedb01.psi.ch:/shome t3ldap01.psi.ch:/shome/martinelli_f t3mon01.psi.ch:/shome t3nagios.psi.ch:/shome/martinelli_f t3nfs01.psi.ch:/shome t3se02.psi.ch:/shome/martinelli_f t3ui01.psi.ch:/shome t3ui02.psi.ch:/shome t3ui03.psi.ch:/shome t3ui04.psi.ch:/shome t3ui05.psi.ch:/shome t3ui06.psi.ch:/shome t3ui07.psi.ch:/shome t3vm01.psi.ch:/shome t3vm03.psi.ch:/shome t3vmmaster01.psi.ch:/vmshare t3wn02.psi.ch:/shome t3wn03.psi.ch:/shome t3wn04.psi.ch:/shome t3wn08.psi.ch:/shome t3wn08.psi.ch:/vmshare t3wn10.psi.ch:/shome t3wn11.psi.ch:/shome t3wn12.psi.ch:/shome t3wn13.psi.ch:/shome t3wn14.psi.ch:/shome t3wn15.psi.ch:/shome t3wn16.psi.ch:/shome t3wn17.psi.ch:/shome t3wn18.psi.ch:/shome t3wn19.psi.ch:/shome t3wn20.psi.ch:/shome t3wn21.psi.ch:/shome t3wn22.psi.ch:/shome t3wn23.psi.ch:/shome t3wn24.psi.ch:/shome t3wn25.psi.ch:/shome t3wn26.psi.ch:/shome t3wn27.psi.ch:/shome t3wn28.psi.ch:/shome t3wn29.psi.ch:/shome </verbatim>%ENDTWISTY% 1 Powering systems off for the yearly maintenance power break in the compute center 1 The admin node must stay on! 1 The file servers (NFS + dcache pools) are allowed to stay on! We do that to make it easier on the disks. 1 Shut down worker nodes 1 Shut down UIs 1 Turn off dcache services (q.v. StartStopDcache215) ---+++ Downtime end: Starting up of the systems Fabio proposes to exploit this downtime to: 1 migrate LDAP from t3admin01 to t3ldap01 because t3admin01 is out of warranty. 1 On AFS =/etc/ldap.conf= has been modified to point to t3ldap01, so a Puppet run will swap the LDAP source on UIs and WNs. 1 migrate GANGLIA from t3ce01 to t3mon01, ganglia sw already installed on [[http://t3mon01.psi.ch/ganglia/][t3mon01]] 1 Apply quota to /tmp and /scratch on Uis and Wns 1 Puppet profile + sw already prepared, please look =/afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/quota-fs-users-in-ldap=, again a Puppet run will make the change. 1 Convert DHCP IPs in fixed IPs ? 1 Eventually we decided to apply a default 7 days lease to skip this step and to mitigate a client lost IP event. 1 Upgrade kernels 1 Start Nagios process ---+++ Upgrades | *Server* | *Kernel* | *Puppet* | */scratch fs* | | t3ui02 | Y | Y | *xfs* | | t3ui03 | Y | Y | ext3 | | t3ui04 | Y | Y | ext3 | | t3ui05 | Y | Y | ext3 | | t3ui06 | Y | Y | ext3 | | t3ui07 | Y | Y | *xfs* | | t3wn10 | Y | Y | ext3 | | t3wn11 | Y | Y | ext3 | | t3wn12 | Y | Y | ext3 | | t3wn13 | Y | Y | ext3 | | t3wn14 | Y | Y | ext3 | | t3wn15 | Y | Y | ext3 | | t3wn16 | Y | Y | ext3 | | t3wn17 | Y | Y | ext3 | | t3wn18 | Y | Y | ext3 | | t3wn19 | Y | Y | ext3 | | t3wn20 | Y | Y | ext3 | | t3wn21 | Y | Y | ext3 | | t3wn22 | Y | Y | ext3 | | t3wn23 | Y | Y | ext3 | | t3wn24 | Y | Y | ext3 | | t3wn25 | Y | Y | ext3 | | t3wn26 | Y | Y | ext3 | | t3wn27 | Y | Y | ext3 | | t3wn28 | Y | Y | ext3 | | t3wn29 | Y | Y | ext3 | | t3mon01 | Y | Y | n.a. |
UpgradePlanningForm
Title
Scheduled PSI downtime
Summary
Yearly maintenance work to be done in the PSI computing center
Target Date
06. 01. 2012
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r15
<
r14
<
r13
<
r12
<
r11
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r15 - 2016-06-08
-
FabioMartinelli
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback