Tags:
create new tag
view all tags

May-20

  • Security updates/measures
    • set 'nosuid' flag for shared file-systems mount points (/t3home, /work and /pg-backup)
    • EGI Trust Anchor release 1.105 -> 1.106
    • frontier-squid 4.10-1.1 -> 4.11-2.1
    • as preventative measure yum/kernel updates on all user facing computers
    • request to "ssh-key passwdordless" users to add passphrase - completed for all accounts
    • excluded suid/sgid binaries on UIs/CNs by checks from EGI security

  • EOS
    • addition of /eos subprojects (/eos/cms, etc) on login nodes
    • update of eos-client 4.5.9 -> 4.7.7 (and eos-xrootd 4.11.2 -> 4.11.3) on UIs
    • decommissioning of EOS test partition Slurm (no single usage since ~2.5 months) and return idle CPU to other queues

  • Misc
    • regular check/cleaning of old ZFS snapshots to release user quota space (when night update script failed to do this automaticaly due to "cannot destroy snapshot ... dataset is busy" error)
    • setup dcache xrootd movers uniformly upto 1000/pool, works stably (underpinned by 10*2NIC Bonding)
    • user management: new account for UniZ student
    • return temporary t3ui04,07 to batch as t3wn49,50
    • re-installation of t3wn48 due to odd (test) partition table and puppet run failure

April-20

  • Slurm
    • memory is configured as consumable resource (default DefMemPerCPU is 2GB/CPU) to prevent out of memory situations caused by users jobs
    • added to client nodes LNAG enviromental variables to /etc/locale.conf to shut out LC_CTYPE/UTF-8 errors of ssh-sessions

  • Miscellaneous:
    • CRIC/SRR storage monitoring ticket closed: storage descriptor is configured on t3dcachedb03
    • updates of EGI Trust Anchor release 1.105-1
    • users question to install phython3/root6 locally not needed, since availble in /cvmfs/sft.cern.ch/lcg/...
    • migration of puppet filecopy location to common for all t3admins gitlab place
    • user accounts/data cleaning (jfernan2, thaarres), creating of new UniZ accounts (sliechti, yverma)

March-20

  • dCache Upgrade Follow-ups:
    • add CMS TFC config to xrootd door on SE node (https://www.dcache.org/downloads/xrootd4j/index.shtml)
    • implementation of Postgres Backup script to copy DB to t3nfs02:/zfs/data01/swshare/postgres
    • dcache after upgrade became too verbose and filled out /var/log partition; to fix the problem dcache restart was done on Sun Mar 15 (without user activity)
  • Storage Cleaning due to almost no free space on dcache:
    • deletion of leftover user data took several days. Too many (hundred thousands) files in single directories: dcache can't handle it
    • overal clenup brought ~ 30% free space; next step is needed - check and clean ~150TB of mc, data dirs
  • Slurm:
    • add QoS (500 cpu/user) to quick partition
    • EOS test configuration (enabled on Worker Nodes and UIs): since February no user feedback
  • Monitoring:
  • all configuration changes saved on hiera/puppet/gitlab
  • most of this list was done remotely from home with no drop in efficiency in compare to work from PSI office
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2020-05-28 - NinaLoktionova
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback