Monitoring
Batch jobs (queuing system)
Current queue / accounting
Number of running and queued jobs:
Ganglia WN page
Storage
/pnfs
dir
Links:
Show space graphs for
/pnfs
dir I/O queues
-
regular
I/O queue movers = dcap/gsidcap/LAN xrootd movers (heavy random IO for internal analysis) ; MAX 100 ACTIVE movers per file server, others will get QUEUED
-
wan
I/O queue movers = SRM/gridftp movers (transfers of whole files also from outside) ; MAX 2 ACTIVE movers per file server, others will get QUEUED
-
xrootd
I/O queue movers = WAN xrootd movers ; MAX 2 ACTIVE movers per file server, others will get QUEUED
- To check by CLI the I/O queues run from a UI
watch -n 1 -d lynx --dump --width=200 'http://t3dcachedb:2288/queueInfo'
e.g. if your jobs are not progressing it might be due to a file server with too many queued movers ; in this case you can inform by email the T3 users ( the T3 admins will get it too )
/mnt/t3nfs01/data01/{shome,swshare}
dirs
User Space Report
Networking and File Transfers (+ PhEDEx)
Links:
Plotting interval:
Availability reports
These tests are run by the centralized Grid monitoring services and they determine whether the T3 or the T2 are considered to be working correctly:
Computer Room Temps
private link