May-20
- Security updates/measures
- set 'nosuid' flag for shared file-systems mount points (/t3home, /work and /pg-backup)
- EGI Trust Anchor release 1.105 -> 1.106
- frontier-squid 4.10-1.1 -> 4.11-2.1
- as preventative measure yum/kernel updates on all user facing computers
- request to "ssh-key passwdordless" users to add passphrase - completed for all accounts
- excluded suid/sgid binaries on UIs/CNs by checks from EGI security
- EOS
- addition of /eos subprojects (/eos/cms, etc) on login nodes
- update of eos-client 4.5.9 -> 4.7.7 (and eos-xrootd 4.11.2 -> 4.11.3) on UIs
- decommissioning of EOS test partition Slurm (no single usage since ~2.5 months) and return idle CPU to other queues
- Misc
- regular check/cleaning of old ZFS snapshots to release user quota space (when night update script failed to do this automaticaly due to "cannot destroy snapshot ... dataset is busy" error)
- setup dcache xrootd movers uniformly upto 1000/pool, works stably (underpinned by 10*2NIC Bonding)
- user management: new account for UniZ student
- return temporary t3ui04,07 to batch as t3wn49,50
- re-installation of t3wn48 due to odd (test) partition table and puppet run failure
April-20
- Slurm
- memory is configured as consumable resource (default DefMemPerCPU is 2GB/CPU) to prevent out of memory situations caused by users jobs
- added to client nodes LNAG enviromental variables to /etc/locale.conf to shut out LC_CTYPE/UTF-8 errors of ssh-sessions
- Miscellaneous:
- CRIC/SRR storage monitoring ticket closed: storage descriptor is configured on t3dcachedb03
- updates of EGI Trust Anchor release 1.105-1
- users question to install phython3/root6 locally not needed, since availble in /cvmfs/sft.cern.ch/lcg/...
- migration of puppet filecopy location to common for all t3admins gitlab place
- user accounts/data cleaning (jfernan2, thaarres), creating of new UniZ accounts (sliechti, yverma)
March-20
- dCache Upgrade Follow-ups:
- add CMS TFC config to xrootd door on SE node (https://www.dcache.org/downloads/xrootd4j/index.shtml)
- implementation of Postgres Backup script to copy DB to t3nfs02:/zfs/data01/swshare/postgres
- dcache after upgrade became too verbose and filled out /var/log partition; to fix the problem dcache restart was done on Sun Mar 15 (without user activity)
- Storage Cleaning due to almost no free space on dcache:
- deletion of leftover user data took several days. Too many (hundred thousands) files in single directories: dcache can't handle it
- overal clenup brought ~ 30% free space; next step is needed - check and clean ~150TB of mc, data dirs
- Slurm:
- add QoS (500 cpu/user) to quick partition
- EOS test configuration (enabled on Worker Nodes and UIs): since February no user feedback
- Monitoring:
- all configuration changes saved on hiera/puppet/gitlab
- most of this list was done remotely from home with no drop in efficiency in compare to work from PSI office
This topic: CmsTier3
> WebHome >
WebPreferences >
AdminArea2019 > T3Status20
Topic revision: r4 - 2020-05-28 - NinaLoktionova