16. 10. 2007 Comparison of dashboard monitoring output and CSCS local monitoring and logs

The ganglia graphs show that between 0h and 1h this night there was a high level of read access to our SE. Since there was some discussion on these rates and the CMS Dashboard showed different numbers for some test cases, I try to follow this up more thoroughly:

All graphs cover the time from 2007-10-15 noon to 2007-10-16 noon. The interesting section is from midnight to about 1 o'clock.

fs-network.gif Fileserver I/O

wn-network.gif Worker Node I/O

total worker nodes "network-input" goes up to 300 MB/s

The following graph lists the number of dCache movers for LAN access per VO. The load was generated by CMS accesses. The graph is convoluted by other CMS jobs which set in after 1 o'clock, mainly cmsprod ones, which also use direct dcap access to the dCache. But the initial rise matches well the one in the "running jobs" further below.

storage01_movers_default.gif

I can see 45 jobs from Stefano between 2007-10-16 00:38 and 00:48h in the PBS accounting logs. Only five other CMS jobs ran in this interval (look at bottom of this page for details).

This is consistent with the Ganglia graph about the running jobs (but it's not easy to extract details from that graph):

running_mod.gif running jobs

Example for a single node: wn29-lcg.projects.cscs.ch with a load of 2-3, shows about 30 MB/s read

wn29-load.gif wn29-network.gif

The PBS logs show indeed that three of Stefanos's jobs were running in parallel around that time:

10/16/2007 00:24:57 cms012 wn29-lcg Exit_status=0 cput=00:04:53 mem=344728kb vmem=559388kb walltime=00:06:59
10/16/2007 00:31:02 cms012 wn29-lcg Exit_status=0 cput=00:04:50 mem=326240kb vmem=539596kb walltime=00:07:33
10/16/2007 00:32:32 cms012 wn29-lcg Exit_status=0 cput=00:04:40 mem=358196kb vmem=572424kb walltime=00:07:34

In the Dashboard I just find 6 analysis jobs from 0h to 1:30. Even if I prolong the time, there are no more. The jobs coming in later are production jobs

The dashboards lists these six jobs as belonging to Christina Eggel, but the cluster logs show that she was not running at that time.

Dashboard link

Dashboard.jpg


Detailed log excerpts from our PBS:

Mappings:

  • cms008 ceggel
  • cms012 belforte

From PBS accounting log: 45 jobs for Stefano

10/16/2007 00:18:36 cms012 wn33-lcg Exit_status=0 cput=00:04:43 mem=313096kb vmem=515560kb walltime=00:07:02
10/16/2007 00:19:04 cms012 wn24-lcg Exit_status=0 cput=00:04:43 mem=337692kb vmem=552152kb walltime=00:06:56
10/16/2007 00:19:04 cms012 wn26-lcg Exit_status=0 cput=00:04:17 mem=330264kb vmem=543708kb walltime=00:06:00
10/16/2007 00:19:33 cms012 wn23-lcg Exit_status=0 cput=00:05:18 mem=321496kb vmem=526176kb walltime=00:07:30
10/16/2007 00:21:21 cms012 wn34-lcg Exit_status=0 cput=00:04:22 mem=354508kb vmem=569924kb walltime=00:06:29
10/16/2007 00:21:49 cms012 wn14-lcg Exit_status=0 cput=00:06:33 mem=305408kb vmem=520760kb walltime=00:09:42
10/16/2007 00:21:58 cms012 wn24-lcg Exit_status=0 cput=00:04:55 mem=340100kb vmem=554260kb walltime=00:07:00
10/16/2007 00:22:16 cms012 wn30-lcg Exit_status=0 cput=00:04:17 mem=326636kb vmem=539820kb walltime=00:06:38
10/16/2007 00:24:57 cms012 wn29-lcg Exit_status=0 cput=00:04:53 mem=344728kb vmem=559388kb walltime=00:06:59
10/16/2007 00:25:16 cms012 wn17-lcg Exit_status=0 cput=00:04:34 mem=377512kb vmem=590132kb walltime=00:06:57
10/16/2007 00:26:06 cms012 wn26-lcg Exit_status=0 cput=00:04:59 mem=381692kb vmem=589832kb walltime=00:07:00
10/16/2007 00:26:11 cms012 wn23-lcg Exit_status=0 cput=00:04:48 mem=345924kb vmem=558472kb walltime=00:06:36
10/16/2007 00:26:35 cms012 wn24-lcg Exit_status=0 cput=00:04:49 mem=342872kb vmem=566672kb walltime=00:07:29
10/16/2007 00:26:51 cms012 wn35-lcg Exit_status=0 cput=00:04:41 mem=325796kb vmem=539760kb walltime=00:06:30
10/16/2007 00:27:51 cms012 wn34-lcg Exit_status=0 cput=00:04:37 mem=360372kb vmem=588180kb walltime=00:06:27
10/16/2007 00:28:18 cms012 wn26-lcg Exit_status=0 cput=00:04:44 mem=372688kb vmem=595912kb walltime=00:07:29
10/16/2007 00:28:34 cms012 wn24-lcg Exit_status=0 cput=00:04:25 mem=332364kb vmem=561816kb walltime=00:06:33
10/16/2007 00:29:20 cms012 wn30-lcg Exit_status=0 cput=00:04:39 mem=409668kb vmem=643228kb walltime=00:07:02
10/16/2007 00:29:43 cms012 wn37-lcg Exit_status=0 cput=00:04:54 mem=338020kb vmem=548032kb walltime=00:07:58
10/16/2007 00:30:58 cms012 wn08-lcg Exit_status=0 cput=00:03:59 mem=288692kb vmem=500112kb walltime=00:05:34
10/16/2007 00:31:02 cms012 wn29-lcg Exit_status=0 cput=00:04:50 mem=326240kb vmem=539596kb walltime=00:07:33
10/16/2007 00:31:29 cms012 wn39-lcg Exit_status=0 cput=00:04:36 mem=321776kb vmem=528484kb walltime=00:06:30
10/16/2007 00:31:35 cms012 wn36-lcg Exit_status=0 cput=00:04:32 mem=335248kb vmem=538840kb walltime=00:07:30
10/16/2007 00:32:18 cms012 wn17-lcg Exit_status=0 cput=00:04:40 mem=392936kb vmem=624740kb walltime=00:07:01
10/16/2007 00:32:32 cms012 wn29-lcg Exit_status=0 cput=00:04:40 mem=358196kb vmem=572424kb walltime=00:07:34
10/16/2007 00:34:11 cms012 wn26-lcg Exit_status=0 cput=00:05:12 mem=346536kb vmem=552036kb walltime=00:08:03
10/16/2007 00:34:13 cms012 wn24-lcg Exit_status=0 cput=00:04:48 mem=331984kb vmem=543256kb walltime=00:07:32
10/16/2007 00:34:15 cms012 wn17-lcg Exit_status=0 cput=00:04:53 mem=386824kb vmem=608548kb walltime=00:08:33
10/16/2007 00:34:16 cms012 wn23-lcg Exit_status=0 cput=00:04:51 mem=344800kb vmem=557516kb walltime=00:08:03
10/16/2007 00:34:24 cms012 wn35-lcg Exit_status=0 cput=00:04:47 mem=388464kb vmem=611836kb walltime=00:07:29
10/16/2007 00:34:24 cms012 wn37-lcg Exit_status=0 cput=00:05:14 mem=344976kb vmem=552660kb walltime=00:07:31
10/16/2007 00:34:37 cms012 wn25-lcg Exit_status=0 cput=00:04:59 mem=364492kb vmem=574996kb walltime=00:08:33
10/16/2007 00:35:37 cms012 wn24-lcg Exit_status=0 cput=00:04:34 mem=356456kb vmem=565896kb walltime=00:07:01
10/16/2007 00:35:38 cms012 wn20-lcg Exit_status=0 cput=00:04:25 mem=333200kb vmem=542564kb walltime=00:08:03
10/16/2007 00:35:47 cms012 wn21-lcg Exit_status=0 cput=00:04:43 mem=403924kb vmem=661196kb walltime=00:07:30
10/16/2007 00:35:55 cms012 wn17-lcg Exit_status=0 cput=00:05:06 mem=334204kb vmem=545228kb walltime=00:09:03
10/16/2007 00:36:19 cms012 wn34-lcg Exit_status=0 cput=00:04:57 mem=328180kb vmem=544592kb walltime=00:08:26
10/16/2007 00:36:22 cms012 wn30-lcg Exit_status=0 cput=00:04:53 mem=319240kb vmem=535068kb walltime=00:06:59
10/16/2007 00:36:48 cms012 wn26-lcg Exit_status=0 cput=00:05:03 mem=329196kb vmem=532016kb walltime=00:08:28
10/16/2007 00:36:52 cms012 wn33-lcg Exit_status=0 cput=00:05:06 mem=326684kb vmem=538564kb walltime=00:07:59
10/16/2007 00:37:22 cms012 wn37-lcg Exit_status=0 cput=00:04:41 mem=339256kb vmem=552724kb walltime=00:07:33
10/16/2007 00:37:32 cms012 wn28-lcg Exit_status=0 cput=00:04:29 mem=344888kb vmem=566512kb walltime=00:07:04
10/16/2007 00:37:38 cms012 wn21-lcg Exit_status=0 cput=00:04:46 mem=328276kb vmem=540560kb walltime=00:07:59
10/16/2007 00:37:59 cms012 wn38-lcg Exit_status=0 cput=00:04:34 mem=330128kb vmem=543836kb walltime=00:07:32
10/16/2007 00:38:02 cms012 wn23-lcg Exit_status=0 cput=00:04:40 mem=341288kb vmem=549720kb walltime=00:07:03

The only other CMS jobs in between 0h and 1h:

10/16/2007 00:12:07 cmsprd Exit_status=0 cput=00:00:15 mem=25816kb vmem=76628kb walltime=00:01:13
10/16/2007 00:12:38 cmssgm Exit_status=0 cput=00:00:36 mem=100100kb vmem=269368kb walltime=00:01:02
10/16/2007 00:22:39 cmsprd Exit_status=0 cput=00:00:55 mem=103820kb vmem=401104kb walltime=00:16:06
10/16/2007 00:44:43 cms008 Exit_status=0 cput=00:06:40 mem=800136kb vmem=1220020kb walltime=00:12:23
10/16/2007 00:51:40 cms008 Exit_status=0 cput=00:04:20 mem=679404kb vmem=994012kb walltime=00:06:56

-- DerekFeichtinger - 16 Oct 2007

* PHOENIX-fileservers-bytes_out.gif:
PHOENIX-fileservers-bytes_out.gif

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg Dashboard.jpg r1 manage 45.0 K 2007-10-16 - 15:14 DerekFeichtinger  
GIFgif PHOENIX-fileservers-bytes_out.gif r1 manage 17.9 K 2007-10-16 - 14:52 DerekFeichtinger  
GIFgif fs-network.gif r1 manage 15.9 K 2007-10-16 - 14:50 DerekFeichtinger  
GIFgif running_mod.gif r1 manage 24.0 K 2007-10-16 - 15:05 DerekFeichtinger  
GIFgif storage01_movers_default.gif r1 manage 17.4 K 2007-10-16 - 15:27 DerekFeichtinger  
GIFgif wn-network.gif r1 manage 14.2 K 2007-10-16 - 14:51 DerekFeichtinger  
GIFgif wn29-load.gif r1 manage 16.4 K 2007-10-16 - 14:51 DerekFeichtinger  
GIFgif wn29-network.gif r1 manage 12.9 K 2007-10-16 - 14:52 DerekFeichtinger  
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2007-10-16 - DerekFeichtinger
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback