Monitoring
Batch jobs (queuing system)
Current queue / accounting / CMS Dashboard
Number of running and queued jobs:
Ganglia WN page
Storage
/pnfs
dir
Show space graphs for
Links:
/pnfs
dir I/O queues
-
regular
I/O queue movers = dcap/gsidcap/LAN xrootd movers (heavy random IO for internal analysis) ; MAX 100 ACTIVE movers per file server, others will get QUEUED
-
wan
I/O queue movers = SRM/gridftp movers (transfers of whole files also from outside) ; MAX 2 ACTIVE movers per file server, others will get QUEUED
-
xrootd
I/O queue movers = WAN xrootd movers ; MAX 2 ACTIVE movers per file server, others will get QUEUED
-
[t3ui1*]$ watch -n 1 elinks 'http://t3dcachedb:2288/queueInfo'
to check by CLI all the I/O queues ; e.g. if your jobs are not progressing it might be due to a file server with too many queued movers ; in this case you might notify by email the T3 users ( the T3 admins will receive that email too )
ACTIVE movers:
QUEUED movers ( the associated I/O queue is exceeding the max amount of allowed
ACTIVE movers ) :
PENDING requests (these are hanging file transfers, almost always an error state if they persist):
OLD /shome
and /swshare
dirs
/shome space usage
NEW /shome
and /swshare
dirs
http://t3mon.psi.ch/ganglia/host_gmetrics.php?c=PSI%20Tier3%20services&h=t3nfs01.psi.ch
Networking and File Transfers (+ PhEDEx)
Links:
Plotting interval:
Availability reports
These tests are run by the centralized Grid monitoring services and they determine whether the T3 or the T2 are considered to be working correctly:
Computer Room Temps
private link