CMS Tier-3 Upgrade Planning Page
phase B restructuring
Summary
dcache upgrade to 1.9.5-16. Test evacuation of old X4500 pools. Research slow t3fs05 transfer speeds
dCache upgrade to 1.9.5-16
- stop phedex
- UI: Prevent user login and reboot to get rid of all logged in users
- We may want to kill all running jobs on the nodes (but we also can just let them run and fail)
- Stop dcache
- Make a backup of the postgres DB
time pg_dumpall -U postgres > dcachedb01-dbbackup-20100311.bup
real 0m18.158s
user 0m0.614s
sys 0m3.094s
- Make a backup of the current installation: one for t3se01 and one for a Thumper (t3fs05)
ssh t3se01 mv /opt/d-cache /opt/d-cache-1.9.2-5
ssh t3dcachedb01 mv /opt/d-cache /opt/d-cache-1.9.2-5
cexec fs: mv /opt/d-cache /opt/d-cache-1.9.2-5
- t3se01 upgrade
- install the RPM
- Put the configuration files in place and check them
- /opt/d-cache/config/dCacheSetup
- /opt/d-cache/etc/node_config
- DON NOT FORGET TO RUN install.sh!!!
- t3dcachedb01 upgrade
- install the RPM
- Put the configuration files in place and check them
- /opt/d-cache/config/dCacheSetup
- /opt/d-cache/etc/node_config
- /opt/d-cache/etc/dcachesrm-gplazma.policy
- /etc/grid-security/grid-vorolemap
- /etc/grid-security/storage-authzdb
- /opt/d-cache/etc/glue-1.3.xml (Info System)
- DON NOT FORGET TO RUN install.sh!!!
- Check whether the pools are correctly found by the init scripts
cexec fs: /opt/d-cache/bin/dcache pool ls
- Fileserver upgrade
- Install the Solaris packages on the File servers
cexec fs: pkgrm -n dCache
ssh t3fs01 pkgadd -n dcache-server-1.9.5-16.pkg # regrettably needs one interactive answer on each server
...
- Put the fileserver configuration files in place and check them
- /opt/d-cache/config/dCacheSetup Note that dCacheSetup may require a different java location on the fileservers
- /opt/d-cache/etc/node_config
- DON NOT FORGET TO RUN install.sh!!!
- Confirm that the versions are correct everywhere
ssh t3se01 /opt/d-cache/bin/dcache version
ssh t3dcachedb01 /opt/d-cache/bin/dcache version
cexec fs: /opt/d-cache/bin/dcache version
- Start dcache on t3se01 and dcachedb01
- Check whether the cells come up correctly
- Start dcache on a single pool
- Check services using our testing script
- Start remaining pools
- Investigate whether the Info system is still running ok: the format of the
/opt/d-cache/etc/glue-1.3.xml
file had changed quite a bit. Prepared the new one.
test on t3fs05 filesystem to find bottleneck
Pool migration from t3fs05 to a new Thor
List of open tasks
- Virtual machine infrastructure
- install a semi-permanent vmware-server host (t3wn08 has 1 broken NIC port. Should probably free this machine for repairs)
- test running VMs over NFS with the images residing on ZFS on a thumper (t3fs06?)
- migrate all virtual machines to this new installation DOWNTIME
- File servers and dCache
- find solution to upgrade problem
- prepare new Thors for dcache
- make standard configuration procedure where puppet takes over most of the config. We cannot do a full puppet host install, since there is no coupling between the JumpStart and puppet
- setup raidz2 ZFS structure for the pools
- install dCache and bring the Thors online with writes disabled
- migrate the data to the new pools to free up servers
- Reinstall the old thumpers through Jumpstart, standard config + puppet, so that we have everywhere the same Solaris version and a raidz2 configuration
- Migrate dcache to Chimera
- Home directories
- implement daily snapshots of shome on t3fs06 (cron based script, delete older snapshots)
- implement incremental snapshot transfers to a backup server
- Services
- convert the VM t3ui02 to a real physical machine (let's take t3wn01)
- Setup a new VM for the VO-Box (mostly phedex... I think that frontier should stay on a phys host with local HD)
- Lesser priority
- LDAP direcetory service
- should we move that onto a VM? Is the admin host indeed a good place for this? backup and failback? This is a critical system
- make use of the new extension fields, so that the can be used in automated scripts
- Attach system to NAGIOS (maybe a project for a practicum student?)
This topic: CmsTier3
> WebHome >
AdminArea > UpgradePlanning201002261248
Topic revision: r4 - 2010-03-15 - DerekFeichtinger