Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> *CMS Tier-3 Upgrade Planning Page* <!-- # Use the attached form to define title and summary of this news item. The details you can fill in directly on this wiki page. --> ---+ %FORMFIELD{"Title"}% ---++ Summary %FORMFIELD{"Summary"}% ---+++ dCache upgrade to 1.9.5-16 1 stop phedex 1 UI: Prevent user login and reboot to get rid of all logged in users 1 We may want to kill all running jobs on the nodes (but we also can just let them run and fail) 1 Stop dcache 1 Make a backup of the postgres DB <pre> time pg_dumpall -U postgres > dcachedb01-dbbackup-20100311.bup real 0m18.158s user 0m0.614s sys 0m3.094s </pre> 1 Make a backup of the current installation: one for t3se01 and one for a Thumper (t3fs05) <pre> ssh t3se01 mv /opt/d-cache /opt/d-cache-1.9.2-5 ssh t3dcachedb01 mv /opt/d-cache /opt/d-cache-1.9.2-5 cexec fs: mv /opt/d-cache /opt/d-cache-1.9.2-5 </pre> 1 t3se01 upgrade 1 install the RPM 1 Put the configuration files in place and check them * /opt/d-cache/config/dCacheSetup * /opt/d-cache/etc/node_config 1 *DON NOT FORGET TO RUN install.sh!!!* 1 t3dcachedb01 upgrade 1 install the RPM 1 Put the configuration files in place and check them * /opt/d-cache/config/dCacheSetup * /opt/d-cache/etc/node_config * /opt/d-cache/etc/dcachesrm-gplazma.policy * /etc/grid-security/grid-vorolemap * /etc/grid-security/storage-authzdb * /opt/d-cache/etc/glue-1.3.xml (Info System) 1 *DON NOT FORGET TO RUN install.sh!!!* 1 Check whether the pools are correctly found by the init scripts <pre> cexec fs: /opt/d-cache/bin/dcache pool ls </pre> 1 Fileserver upgrade 1 Install the Solaris packages on the File servers <pre> cexec fs: pkgrm -n dCache ssh t3fs01 pkgadd -n dcache-server-1.9.5-16.pkg # regrettably needs one interactive answer on each server ... </pre> 1 Put the fileserver configuration files in place and check them * /opt/d-cache/config/dCacheSetup Note that dCacheSetup may require a different java location on the fileservers * /opt/d-cache/etc/node_config 1 *DON NOT FORGET TO RUN install.sh!!!* 1 Confirm that the versions are correct everywhere <pre> ssh t3se01 /opt/d-cache/bin/dcache version ssh t3dcachedb01 /opt/d-cache/bin/dcache version cexec fs: /opt/d-cache/bin/dcache version </pre> 1 Start dcache on t3se01 and dcachedb01 * Check whether the cells come up correctly 1 Start dcache on a single pool * Check services using our testing script 1 Start remaining pools 1 Investigate whether the Info system is still running ok: the format of the =/opt/d-cache/etc/glue-1.3.xml= file had changed quite a bit. Prepared the new one. ---+++ test on t3fs05 filesystem to find bottleneck ---+++ Pool migration from t3fs05 to a new Thor ---++ List of open tasks * Virtual machine infrastructure * install a semi-permanent vmware-server host (t3wn08 has 1 broken NIC port. Should probably free this machine for repairs) * test running VMs over NFS with the images residing on ZFS on a thumper (t3fs06?) * migrate all virtual machines to this new installation *DOWNTIME* * File servers and dCache * find solution to upgrade problem %Y% * prepare new Thors for dcache * make standard configuration procedure where puppet takes over most of the config. We cannot do a full puppet host install, since there is no coupling between the JumpStart and puppet * setup raidz2 ZFS structure for the pools * install dCache and bring the Thors online with writes disabled * migrate the data to the new pools to free up servers * Reinstall the old thumpers through Jumpstart, standard config + puppet, so that we have everywhere the same Solaris version and a raidz2 configuration * Migrate dcache to Chimera * Home directories * implement daily snapshots of shome on t3fs06 (cron based script, delete older snapshots) %Y% * implement incremental snapshot transfers to a backup server * Services * convert the VM t3ui02 to a real physical machine (let's take t3wn01) %Y% * Setup a new VM for the VO-Box (mostly phedex... I think that frontier should stay on a phys host with local HD) * Lesser priority * LDAP direcetory service * should we move that onto a VM? Is the admin host indeed a good place for this? backup and failback? This is a critical system * make use of the new extension fields, so that the can be used in automated scripts * Attach system to NAGIOS (maybe a project for a practicum student?)
UpgradePlanningForm
Title
phase B restructuring
Summary
dcache upgrade to 1.9.5-16. Test evacuation of old X4500 pools. Research slow t3fs05 transfer speeds
Target Date
11-12. 03. 2010
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r4 - 2010-03-15
-
DerekFeichtinger
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback