| welcome to test Slurm GPU and CPU nodes -- 10. 05. 2019 NinaLoktionova
Dear T3 Users,

please let us know if you are interested in testing new Batch System and SW on RHEL7:

https://wiki.chipp.ch/twiki/bin/view/CmsTier3/SlurmUsage

|

Monitoring

Batch jobs (queuing system)

Current queue / accounting

Number of running and queued jobs:


Ganglia WN page

Storage

/pnfs dir

Links:

Show storage space graphs for

/pnfs dir I/O queues

  • regular I/O queue movers = dcap/gsidcap/LAN xrootd movers (heavy random IO for internal analysis) ; MAX 100 ACTIVE movers per file server, others will get QUEUED
  • wan I/O queue movers = SRM/gridftp movers (transfers of whole files also from outside) ; MAX 2 ACTIVE movers per file server, others will get QUEUED
  • xrootd I/O queue movers = WAN xrootd movers ; MAX 2 ACTIVE movers per file server, others will get QUEUED
  • To check by CLI the I/O queues run from a UI watch -n 1 -d  lynx --dump --width=200 'http://t3dcachedb:2288/queueInfo' e.g. if your jobs are not progressing it might be due to a file server with too many queued movers ; in this case you can inform by email the T3 users ( the T3 admins will get it too )


/mnt/t3nfs01/data01/{shome,swshare} dirs NEW

User Space Report

Networking and File Transfers (+ PhEDEx)

Links:

Plotting interval:


Availability reports

These tests are run by the centralized Grid monitoring services and they determine whether the T3 or the T2 are considered to be working correctly:

Computer Room Temps

private link
Edit | Attach | Watch | Print version | History: r117 < r116 < r115 < r114 < r113 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r114 - 2018-05-04 - NinaLoktionova
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback