Monitoring
Batch jobs (queuing system)
Current queue / accounting
Number of running and queued jobs:
Ganglia WN page
Storage
/pnfs
dir
Show space graphs for
Links:
/pnfs
dir I/O queues
-
regular
I/O queue movers = dcap/gsidcap/LAN xrootd movers (heavy random IO for internal analysis) ; MAX 100 ACTIVE movers per file server, others will get QUEUED
-
wan
I/O queue movers = SRM/gridftp movers (transfers of whole files also from outside) ; MAX 2 ACTIVE movers per file server, others will get QUEUED
-
xrootd
I/O queue movers = WAN xrootd movers ; MAX 2 ACTIVE movers per file server, others will get QUEUED
-
[t3ui1*]$ watch -n 1 elinks 'http://t3dcachedb:2288/queueInfo'
to check by CLI all the I/O queues ; e.g. if your jobs are not progressing it might be due to a file server with too many queued movers ; in this case you might notify by email the T3 users ( the T3 admins will receive that email too )
ACTIVE movers:
QUEUED movers ( the associated I/O queue is exceeding the max amount of allowed
ACTIVE movers ) :
PENDING requests (these are hanging file transfers, almost always an error state if they persist):
/mnt/t3nfs01/data01/{shome,swshare}
dirs
User Space Report
Networking and File Transfers (+ PhEDEx)
Links:
Plotting interval:
Availability reports
These tests are run by the centralized Grid monitoring services and they determine whether the T3 or the T2 are considered to be working correctly:
- CMS Nagios : T3
- German Nagios : T3
- Gstat : T3
Computer Room Temps
private link