Tags:
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ How to check the CSCS Tier-2 status for CMS site contacts / site managers *This is a small routine which should be performed once a day by the responsible CMS site contact*. Some of these things can and should be automatized at some point, but the manual check does not take much time and will increase your understanding of the system. All the basic information and links can be found from the main monitoring page: PhoenixMonOverview 1. Look at the three pie charts for the worker nodes, service nodes, and the file servers. <br>The service and fileserver pie charts must show now black parts (i.e. nodes down). A few worker nodes that are down are not so critical, but you still may want to contact the site admins. 1. Check all SAM tests using the links towards the top of the page, the CMS SAM tests being the most important ones for us. 1. Check the graphs for running and queued jobs. <br> You should only see a number of queued CMS jobs, if the cluster is filled with running jobs. If jobs stay in the queue despite free slots on the cluster, something with the scheduling is wrong. 1. Check the free storage space for CMS, and take note of the trend shown over the last week. <br> You can check how much space is taken up by users and datasets by using the Links below the _Storage Element_ section. 1. Take a look at the graphs for the dcache movers. If you see a large number of queued movers (especially if it is still growing) you may want to notify the admins. 1. Check !Phedex by looking at the output of the log analyzer (links below _Networking and File Transfers_) and if necessary make sure that the !Phedex processes are up. If there are lots of transfer errors, try to analyze them based on what you see in the log analyzer and contact the responsible people (which may be our admins or the admins of the remote site at fault). 1. Check whether there are any pending data set requests (There is a link to the correct page below the _Storage Element_ section). <br> The decision whether to allow the request must be based on the available space and policy -- Main.DerekFeichtinger - 27 Nov 2008
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r1 - 2008-11-27
-
DerekFeichtinger
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback