regular check/cleaning of old ZFS snapshots to release user quota space (when night update script failed to do this automaticaly due to "cannot destroy snapshot ... dataset is busy" error)
setup dcache xrootd movers uniformly upto 1000/pool, works stably (underpinned by 10*2NIC Bonding)
user management: new account for UniZ student
return temporary t3ui04,07 to batch as t3wn49,50
re-installation of t3wn48 due to odd (test) partition table and puppet run failure
April-20
Slurm
memory is configured as consumable resource (default DefMemPerCPU is 2GB/CPU) to prevent out of memory situations caused by users jobs
added to client nodes LNAG enviromental variables to /etc/locale.conf to shut out LC_CTYPE/UTF-8 errors of ssh-sessions
implementation of Postgres Backup script to copy DB to t3nfs02:/zfs/data01/swshare/postgres
dcache after upgrade became too verbose and filled out /var/log partition; to fix the problem dcache restart was done on Sun Mar 15 (without user activity)
Storage Cleaning due to almost no free space on dcache:
deletion of leftover user data took several days. Too many (hundred thousands) files in single directories: dcache can't handle it
overal clenup brought ~ 30% free space; next step is needed - check and clean ~150TB of mc, data dirs
Slurm:
add QoS (500 cpu/user) to quick partition
EOS test configuration (enabled on Worker Nodes and UIs): since February no user feedback
Monitoring:
manually added non-standard /work server t3nfs02 to ganglia
solved the problem with SELinux (the reason of http access error) on ganglia server; works stably