CCRC08

General impression

Most of CCRC08 went quite well for T2_CH_CSCS. No major technical problems were observed with the exception of the first week of June, where a middle ware update caused many jobs to abort.

CSCS serves three experiments: ATLAS, CMS and LHCb. For most of this month only CMS significantly exercised the system. This is particularly valid in respect to the storage where only the CMS jobs used access by dcap protocol. The total generated load was not enough to really stress our ressources and the usage patterns are surely not representative of what we will see in a few months.

Jobs

Running jobs per experiment:

Job quality was rather good over the period of May:

Overview over the used datasets:

During the first week of June the middleware update caused a lot of problems:

PhEDEx

PhEDEx downloads were of much better quality than in the months before (probably due to everybody being much more attentive), but the system was under much lower stress due to the turning off of the Debug instance's LoadTest transfers. This is reflected in the plots below being mostly empty.

PhEDEx Transfer Quality:

PhEDEx Transfer Rate:

PhEDEx Transfer Volume:

2. 6. - 5. 6. 2008 Analysis Latency Tests

Exercises consist of subscribing to a data set and running a sample analysis job with CRAB over all its files.

Monday and Tuesday CSCS had a number of problems due to a middleware upgrade of the CE and nodes which despite expectations caused a lot of jobs to abort. But I was able to download a dataset from PIC at a speed of ~90MB/s.

Tuesday evening and night was ok, but allowing only one cycle for the following data set:

Dataset name: /Njet_5j_400_5600-alpgen/CMSSW_1_6_7-CSA07-1206675842/RECO from T1_PIC

Time	Time delta	Comment
15:00	0	Dataset subscription approved
15:25	0:25	download beginning
20:45	5:45	dataset completely on site (281 Files from T1_ES_PIC, 1.4TB, 0 Errors, avg. 69.1 MB/s)
21:40	6:40	DBS still only shows 41 files at CSCS
21:43	6:43	DBS shows whole dataset to be on site
21:45	6:45	crab -submit
21:49	6:49	all 29 Jobs running at T2_CH_CSCS
04:34	13:34	last job finished (no Errors)

Wednesday was again bad luck, when FZK's network had severe problems, causing all transfers for /WW_incl/CMSSW_1_6_7-CSA07-1196178448/RECO to fail. FZK recovered sometime in the afternoon, but the CSCS PhEDEx download agent got stuck because of blocking of a glite-transfer-query and a glite-transfer-submit request, which I only noticed on Thursday morning. PhEDEx transfers from FZK began to trickle in by 13:30h, but the cluster is now completely filled with user jobs, which will add considerable latency to the eventual CRAB run.

Go to previous page / next page of CMS site log

-- DerekFeichtinger - 04 Jun 2008

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who
jpg	Dashboard-activity-June.jpg	r1	manage	47.9 K	2008-06-05 - 19:49	DerekFeichtinger
jpg	Dashboard-activity-May.jpg	r1	manage	56.2 K	2008-06-05 - 19:48	DerekFeichtinger
jpg	Dashboard-datasets-May.jpg	r1	manage	56.0 K	2008-06-05 - 19:49	DerekFeichtinger
jpg	Dashboard-users-May1.jpg	r1	manage	82.7 K	2008-06-05 - 19:49	DerekFeichtinger
jpg	PhEDExTransferQuality.jpg	r1	manage	48.2 K	2008-06-05 - 19:47	DerekFeichtinger
jpg	PhEDExTransferQualityPerSite.jpg	r1	manage	52.9 K	2008-06-05 - 19:58	DerekFeichtinger
jpg	PhEDExTransferRate.jpg	r1	manage	54.9 K	2008-06-05 - 19:47	DerekFeichtinger
jpg	PhEDExTransferRatePerSite.jpg	r1	manage	63.7 K	2008-06-05 - 19:58	DerekFeichtinger
jpg	PhEDExTransferVolume.jpg	r1	manage	55.6 K	2008-06-05 - 19:48	DerekFeichtinger
jpg	PhEDExTransferVolumePerSite.jpg	r1	manage	57.8 K	2008-06-05 - 19:58	DerekFeichtinger
jpg	fileservers-IO.jpg	r1	manage	19.9 K	2008-06-05 - 20:08	DerekFeichtinger
jpg	storage_free_cms.jpg	r1	manage	11.1 K	2008-06-05 - 20:07	DerekFeichtinger
gif	running-month.gif	r1	manage	12.4 K	2008-06-05 - 20:07	DerekFeichtinger