<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> *CMS Tier-3 Upgrade Planning Page* <!-- # Use the attached form to define title and summary of this news item. The details you can fill in directly on this wiki page. --> ---+!! %FORMFIELD{"Title"}% %TOC% ---++ Summary %FORMFIELD{"Summary"}% ---++ Details ---+++ before downtime: preparation work 1 Dowtime Announcement * to users through mailing list and news section (done already in December) * to [[https://gocdb4.esc.rl.ac.uk/portal/index.php?Page_Type=View_Object&object_id=312&grid_id=0][GOCDB]] (we only were able to do it on Jan 5th) 1 Put SGE queues in draining mode according to their runtime limits ---+++ Downtime start on 2012-01-06, 15h 1 Stop the Nagios process on t3nagios 1 Barring access to users and killing all user sessions * set =/etc/security/access.conf= for all login machines to only allow admin access * reboot all UIs 1 ... 1 *Shortly before 15:30h* switch to the new virtual monitoring node t3mon01 * 1 DNS alias entry will be changed by Mauro at 15:30h 1 <strike>Shut down old virtualization environment</strike> WILL LEAVE THEM ON * t3vmmaster01: hosts t3vm03 (test WN) and obsolete t3vmbdii (off anyhow) * t3wn08: * t3vobox (active CMS Phedex service) * t3se02, t3fs12 (dcache testing env) * t3jstart (solaris jumpstart. off) * t3vm04 (obsolete solaris testing machine, off) 1 <strike>Shut down NFS servers t3fs06 and t3fs05</strike> NOT NECESSARY %TWISTY{showlink="Show" hidelink="Hide" showimgleft="%ICONURLPATH{toggleopen-small}%" hideimgleft="%ICONURLPATH{toggleclose-small}%"}%<verbatim> t3fs06# showmount -a | sort 192.33.123.200:/shome 192.33.123.209:/shome loghost:/shome t3ce.psi.ch:/shome t3ce01.psi.ch:/shome t3ce02.psi.ch:/shome t3cmsvobox02.psi.ch:/shome t3dcachedb01.psi.ch:/shome t3ldap01.psi.ch:/shome/martinelli_f t3mon01.psi.ch:/shome t3nagios.psi.ch:/shome/martinelli_f t3nfs01.psi.ch:/shome t3se02.psi.ch:/shome/martinelli_f t3ui01.psi.ch:/shome t3ui02.psi.ch:/shome t3ui03.psi.ch:/shome t3ui04.psi.ch:/shome t3ui05.psi.ch:/shome t3ui06.psi.ch:/shome t3ui07.psi.ch:/shome t3vm01.psi.ch:/shome t3vm03.psi.ch:/shome t3vmmaster01.psi.ch:/vmshare t3wn02.psi.ch:/shome t3wn03.psi.ch:/shome t3wn04.psi.ch:/shome t3wn08.psi.ch:/shome t3wn08.psi.ch:/vmshare t3wn10.psi.ch:/shome t3wn11.psi.ch:/shome t3wn12.psi.ch:/shome t3wn13.psi.ch:/shome t3wn14.psi.ch:/shome t3wn15.psi.ch:/shome t3wn16.psi.ch:/shome t3wn17.psi.ch:/shome t3wn18.psi.ch:/shome t3wn19.psi.ch:/shome t3wn20.psi.ch:/shome t3wn21.psi.ch:/shome t3wn22.psi.ch:/shome t3wn23.psi.ch:/shome t3wn24.psi.ch:/shome t3wn25.psi.ch:/shome t3wn26.psi.ch:/shome t3wn27.psi.ch:/shome t3wn28.psi.ch:/shome t3wn29.psi.ch:/shome </verbatim>%ENDTWISTY% 1 Powering systems off for the yearly maintenance power break in the compute center 1 The admin node must stay on! 1 The file servers (NFS + dcache pools) are allowed to stay on! We do that to make it easier on the disks. 1 Shut down worker nodes 1 Shut down UIs 1 Turn off dcache services (q.v. StartStopDcache215) ---+++ Downtime end: Starting up of the systems Fabio proposes to exploit this downtime to: 1 migrate LDAP from t3admin01 to t3ldap01 because t3admin01 is out of warranty. 1 On AFS =/etc/ldap.conf= has been modified to point to t3ldap01, so a Puppet run will swap the LDAP source on UIs and WNs. 1 migrate GANGLIA from t3ce01 to t3mon01, ganglia sw already installed on [[http://t3mon01.psi.ch/ganglia/][t3mon01]] 1 Apply quota to /tmp and /scratch on Uis and Wns 1 Puppet profile + sw already prepared, please look =/afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/quota-fs-users-in-ldap=, again a Puppet run will make the change. 1 Convert DHCP IPs in fixed IPs ? 1 Eventually we decided to apply a default 7 days lease to skip this step and to mitigate a client lost IP event. 1 Upgrade kernels 1 Start Nagios process ---+++ Upgrades | *Server* | *Kernel* | *Puppet* | */scratch fs* | | t3ui02 | Y | Y | *xfs* | | t3ui03 | Y | Y | ext3 | | t3ui04 | Y | Y | ext3 | | t3ui05 | Y | Y | ext3 | | t3ui06 | Y | Y | ext3 | | t3ui07 | Y | Y | *xfs* | | t3wn10 | Y | Y | ext3 | | t3wn11 | Y | Y | ext3 | | t3wn12 | Y | Y | ext3 | | t3wn13 | Y | Y | ext3 | | t3wn14 | Y | Y | ext3 | | t3wn15 | Y | Y | ext3 | | t3wn16 | Y | Y | ext3 | | t3wn17 | Y | Y | ext3 | | t3wn18 | Y | Y | ext3 | | t3wn19 | Y | Y | ext3 | | t3wn20 | Y | Y | ext3 | | t3wn21 | Y | Y | ext3 | | t3wn22 | Y | Y | ext3 | | t3wn23 | Y | Y | ext3 | | t3wn24 | Y | Y | ext3 | | t3wn25 | Y | Y | ext3 | | t3wn26 | Y | Y | ext3 | | t3wn27 | Y | Y | ext3 | | t3wn28 | Y | Y | ext3 | | t3wn29 | Y | Y | ext3 | | t3mon01 | Y | Y | n.a. |
UpgradePlanningForm
Title
Scheduled PSI downtime
Summary
Yearly maintenance work to be done in the PSI computing center
Target Date
06. 01. 2012
This topic: CmsTier3
>
WebHome
>
AdminArea
>
UpgradePlanning201201050907
Topic revision: r15 - 2016-06-08 - FabioMartinelli
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback