CMS Tier-3 Upgrade Planning Page
miscellaneous upgrades
Summary
keeping track and planning for future upgrades
Details
Fileserver install status
Fileserver installation status as of 2010-08-27
t3fs01 |
SunOS 5.10 Generic_141445-09 |
t3fs02 |
SunOS 5.10 Generic_141445-09 |
t3fs03 |
SunOS 5.10 Generic_137112-05 fixed! |
t3fs04 |
SunOS 5.10 Generic_137138-09 |
t3fs05 |
SunOS 5.10 Generic_141445-09 |
t3fs07 |
SunOS 5.10 Generic_141445-09 |
t3fs08 |
SunOS 5.10 Generic_142901-03 |
t3fs09 |
SunOS 5.10 Generic_141445-09 |
t3fs10 |
SunOS 5.10 Generic_141445-09 |
t3fs11 |
SunOS 5.10 Generic_141445-09 |
Security kernel updates
to kernel 2.6.18-194.11.4.el5
t3wn10 |
ok |
t3wn11 |
ok |
t3wn12 |
ok |
t3wn13 |
ok |
t3wn14 |
ok |
t3wn15 |
ok |
t3wn16 |
ok |
t3wn17 |
ok |
t3wn18 |
ok |
t3wn19 |
ok |
t3wn20 |
ok |
t3wn21 |
ok |
t3wn22 |
ok |
t3wn23 |
ok |
t3wn24 |
ok |
t3wn25 |
ok |
t3wn26 |
ok |
t3wn27 |
ok |
t3wn28 |
ok |
t3wn29 |
ok |
t3ui02 |
ok |
List of open tasks
- configuration management and general tasks
- migrate nodes to SL5.4
- establish a notification channel for admin messages from scripts (Nagios would be nice... but let's go for email.. remember that we are in a special DMZ, here)
- regular automated backup of main admin server (configs, LDAP server,...)
- Home directories
- implement incremental snapshot transfers to a backup server (partly done... but not automated, yet)
- Virtual machine infrastructure
- install a semi-permanent vmware-server host (t3wn08 has 1 broken NIC port. Should probably free this machine for repairs): t3vmmaster01
- test running VMs over NFS with the images residing on ZFS on a thumper. on t3fs05
- migrate all virtual machines to this new installation (requires a DOWNTIME)
- implement a snapshot backup for the VMs
- There is a chance that we could use the PSI standard vmware ESXi solution at some point. Needs help from networkers (VPN).
- File servers and dCache
- regular backup of dcache meta data DB
- enable LACP mode for those servers that do not yet have it (q.v. SwitchMappings)
- improve puppet configuration of fileservers
- Migrate dcache to Chimera (http://trac.dcache.org/projects/dcache/wiki/Chimera)
- SE: still on sl4 with an old glite 3.1 info system.
- Seems that we first need to upgrade to the newest dcache release (> v1.9.5-17)
- Migrate the headnodes to the new SL5 architecture managed by puppet
- User Interfaces
- Try to better protect these world-open ssh login nodes from attacks (e.g. with fail2ban or DenyHosts; q.v. this discussion)
- Lesser priority
- LDAP direcetory service
- should we move that onto a VM? Is the admin host indeed a good place for this? backup and failback? This is a critical system
- make use of the new extension fields (t3 cms user extensions), so that they can be used in automated scripts
- Attach system to NAGIOS (maybe a project for a practicum student?)
- Find a way to identify jobs overpowering the NFS area (mainly on UI)
List of done tasks
- 2010-11-25 set up NX services on the User interfaces (introduce additional UIs, maybe by converting the older WNs)
- if necessary (freeNX does not fulfill our requirements), procure the nx services from nomachine. Test it.
- 2010-12-21 Home directories
- User quotas on ZFS shome file system (news item, technical description, Sun docu) (and one can also set group quotas)
- bring t3fs06 to an updated OS version that supports these operations
- this may require to enable LDAP user accounts on Solaris10 (NOT DONE). On the other hand, one can set quotas based on uid.