How to check the CSCS Tier-2 status for CMS site contacts / site managers

This is a small routine which should be performed once a day by the responsible CMS site contact. Some of these things can and should be automatized at some point, but the manual check does not take much time and will increase your understanding of the system.

All the basic information and links can be found from the main monitoring page: PhoenixMonOverview

  1. Look at the three pie charts for the worker nodes, service nodes, and the file servers.
    The service and fileserver pie charts must show now black parts (i.e. nodes down). A few worker nodes that are down are not so critical, but you still may want to contact the site admins.
  2. Check all SAM tests using the links towards the top of the page, the CMS SAM tests being the most important ones for us.
  3. Check the graphs for running and queued jobs.
    You should only see a number of queued CMS jobs, if the cluster is filled with running jobs. If jobs stay in the queue despite free slots on the cluster, something with the scheduling is wrong.
  4. Check the free storage space for CMS, and take note of the trend shown over the last week.
    You can check how much space is taken up by users and datasets by using the Links below the Storage Element section.
  5. Take a look at the graphs for the dcache movers. If you see a large number of queued movers (especially if it is still growing) you may want to notify the admins.
  6. Check Phedex by looking at the output of the log analyzer (links below Networking and File Transfers) and if necessary make sure that the Phedex processes are up. If there are lots of transfer errors, try to analyze them based on what you see in the log analyzer and contact the responsible people (which may be our admins or the admins of the remote site at fault).
  7. Check whether there are any pending data set requests (There is a link to the correct page below the Storage Element section).
    The decision whether to allow the request must be based on the available space and policy

-- DerekFeichtinger - 27 Nov 2008

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2008-11-27 - DerekFeichtinger
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback