Tags:
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup,Main.CMSAdminReaderGroup --> %TOC% ---+ April-20 * Slurm * memory is configured as consumable resource (default !DefMemPerCPU is 2GB/CPU) to prevent out of memory situations caused by users jobs * added to client nodes LNAG enviromental variables to /etc/locale.conf to shut out LC_CTYPE/UTF-8 errors of ssh-sessions * Monitoring: * added Slurm CPU/GPU metric collection scripts and plots: https://wiki.chipp.ch/twiki/bin/view/CmsTier3/SlurmUtilisation * added Admins monitoring list: https://wiki.chipp.ch/twiki/bin/view/CmsTier3/MonitoringList * Miscellaneous: * CRIC/SRR storage monitoring ticket closed: storage descriptor is configured on t3dcachedb03 * updates of EGI Trust Anchor release 1.105-1 * users question to install phython3/root6 locally not needed, since availble in /cvmfs/sft.cern.ch/lcg/... * migration of puppet filecopy location to common for all t3admins gitlab place * user accounts/data cleaning (jfernan2, thaarres), creating of new !UniZ accounts (sliechti, yverma) ---+ March-20 * dCache Upgrade Follow-ups: * add CMS TFC config to xrootd door on SE node (https://www.dcache.org/downloads/xrootd4j/index.shtml) * implementation of Postgres Backup script to copy DB to t3nfs02:/zfs/data01/swshare/postgres * dcache after upgrade became too verbose and filled out /var/log partition; to fix the problem dcache restart was done on Sun Mar 15 (without user activity) * Storage Cleaning due to almost no free space on dcache: * deletion of leftover user data took several days. Too many (hundred thousands) files in single directories: dcache can't handle it * overal clenup brought ~ 30% free space; next step is needed - check and clean ~150TB of mc, data dirs * Slurm: * add !QoS (500 cpu/user) to quick partition * !EOS test configuration (enabled on Worker Nodes and UIs): since February no user feedback * Monitoring: * manually added non-standard /work server t3nfs02 to ganglia * solved the problem with SELinux (the reason of http access error) on ganglia server; works stably * dcache space monitoring added to t3wiki: https://wiki.chipp.ch/twiki/bin/view/CmsTier3/StoragePlots * all configuration changes saved on hiera/puppet/gitlab * most of this list was done remotely from home with no drop in efficiency in compare to work from PSI office
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r3 - 2020-04-28
-
NinaLoktionova
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback