Go to
previous page /
next page of CMS site log
25. 05. 2009 STEP09
Analysis exercises
Table entries taken from central
T2 STEP09 analysis page
Note: The low values for
Percentage of Analysis Pledge Used have the primary reason of not enough jobs being sent. However, we should watch out for conditions where a VO cannot get their assigned resources in appreciable time.
During week 22 we noticed two conditions disturbing the cluster
- There were ATLAS jobs which did heavy prestaging of files from the SE to the WN using dccp. Many nodes ended up with massive I/O wait due to the many dccp processes fighting for the bandwidth and the access to the local scratch space.
- ATLAS then shifted away from using dccp, afterwards seemingly using direct dcap. It seems to me that this is still a strange theoretical payload. The ratio of I/O to CPU is excessive.
- An ATLAS user had reserved 16CPUs for his jobs, without actually making use of them. This blocked the queue and is the explanation for the queued jobs in the above graphs despite only a subset of our actual cores being taken.
Data transfers
Evaluation of the exercise
This is based on the monitoring, below.
First day: Very nice high quality throughput from FNAL. In the first 12 h I had avg. speed of about 70 MB/s. Problems with
IN2P3 (their SE had an error which failed half of the transfers). No Files at all from our dedicated T1 at FZK???
Second day: Still good transfers from FNAL, albeit somewhat slower. One big set finished. Some files from FZK, but at slow speed (~5 MB/s on avg.). Still about 50% failures from
IN2P3, a big fraction still with the "gsiftp protocol not supported" error.
ordered datasets and simple latency observations
I approved the following two requests on 2009-06-10, 19:45:
Request 51292
Data Level |
Data Item |
SRC |
Files |
Size |
06/10 15:25h |
06/11 9:20h |
Dataset |
/Wc-madgraph/Fall08_IDEAL_V9_v1/GEN-SIM-RAW |
T1_DE_FZK |
3254 |
3.13 TB |
0 |
294 (289.1GB) |
Dataset |
/Wc-madgraph/Fall08_IDEAL_V11_redigi_v1/GEN-SIM-RAW |
T1_DE_FZK |
3156 |
3.01 TB |
0 |
0 |
Request 51297
Data Level |
Data Item |
Files |
Size |
SRC |
06/10 9:40h |
06/10 15:25h |
06/11 9:20h |
Dataset |
/QCD100to250-madgraph/Fall08_IDEAL_V11_redigi_v1/GEN-SIM-RECO |
T1_US_FNAL |
4099 |
4.26 TB |
2505 (2.6 TB) |
3718 (3.9TB) |
4099 (4.3TB) |
Dataset |
/InclusivePPmuX/Summer08_IDEAL_V11_redigi_v1/GEN-SIM-RECO |
T1_ES_PIC, T1_FR_CCIN2P3, |
1702 |
1.47 TB |
878 (0.7TB) |
1255 (1.1TB) |
1326 (1.1TB) |
- Note: The SRC column shows the sites with direct links to T2_CH_CSCS that DBS marks as having the sets.
Phedex central monitoring
Phedex central monitoring for last day (taken on 2009-06-11 9:25h):
My own local phedex monitoring:
Error message statistics per site:
Show Hide
===================================
*** ERRORS from T1_ES_PIC_Buffer:***
6 TRANSFER error during TRANSFER phase: [GRIDFTP_ERROR] globus_ftp_client: the server responded with an error 426 Transfer aborted (Transfer failed: Connection timed out)
*** ERRORS from T1_FR_CCIN2P3_Buffer:***
480 SOURCE error during TRANSFER_PREPARATION phase: [GENERAL_FAILURE] Protocol(s) specified not supported: [ gsiftp ]
8 SOURCE error during TRANSFER_PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare source file in 180 seconds
2 SOURCE error during TRANSFER phase: [GRIDFTP_ERROR] an end-of-file was reached globus_xio: An end of file occurred (possibly the destination disk is full)
2 SOURCE error during TRANSFER phase: [TRANSFER_TIMEOUT] globus_ftp_client_size: Connection timed out
*** ERRORS from T1_US_FNAL_Buffer:***
9 TRANSFER error during TRANSFER phase: [GRIDFTP_ERROR] globus_ftp_client: the server responded with an error 426 Transfer aborted (Unexpected Exception : org.dcache.ftp.FTPException: Stream ended before EOD/Data channel was closed before EOD marker)
3 SOURCE error during TRANSFER phase: [GRIDFTP_ERROR] globus_xio: Unable to connect to cmsstor68.fnal.gov:2811 globus_xio: System error in connect: Connection refused globus_xio: A system call failed: Connection refused
2 TRANSFER error during TRANSFER phase: [GRIDFTP_ERROR] an end-of-file was reached globus_xio: An end of file occurred (possibly the destination disk is full)
1 TRANSFER error during TRANSFER phase: [GRIDFTP_ERROR] globus_ftp_client: the server responded with an error 426 Transfer aborted (Transfer failed: Connection reset by peer)
SITE STATISTICS:
==================
first entry: 2009-06-09 07:37:07 last entry: 2009-06-10 07:36:12
T1_ES_PIC_Buffer (OK: 190 Err: 6 Exp: 0 Canc: 0 Lost: 0) succ.: 96.9 % total: 180.5 GB ( 2.1 MB/s)
T1_FR_CCIN2P3_Buffer (OK: 688 Err: 492 Exp: 0 Canc: 0 Lost: 0) succ.: 58.3 % total: 652.5 GB ( 7.6 MB/s)
T1_US_FNAL_Buffer (OK: 2525 Err: 15 Exp: 0 Canc: 0 Lost: 0) succ.: 99.4 % total: 2895.0 GB (33.5 MB/s)
TOTAL SUMMARY:
==================
first entry: 2009-06-09 07:37:07 last entry: 2009-06-10 07:36:12
total transferred: 3472.0 GB in 24.0 hours
avg. total rate: 41.2 MB/s = 329.4 Mb/s = 3474.2 GB/day
Error message statistics per site:
Show Hide
===================================
*** ERRORS from T1_DE_FZK_Buffer:***
2 SOURCE error during TRANSFER phase: [TRANSFER_TIMEOUT] globus_ftp_client_size: Connection timed out
1 TRANSFER error during TRANSFER phase: [GRIDFTP_ERROR] globus_xio: System error in read: Connection reset by peer globus_xio: A system call failed: Connection reset by peer
*** ERRORS from T1_ES_PIC_Buffer:***
1 TRANSFER error during TRANSFER phase: [GRIDFTP_ERROR] globus_ftp_client: the server responded with an error 426 Transfer aborted (Transfer failed: Connection timed out)
*** ERRORS from T1_FR_CCIN2P3_Buffer:***
35 SOURCE error during TRANSFER phase: [GRIDFTP_ERROR] an end-of-file was reached globus_xio: An end of file occurred (possibly the destination disk is full)
20 SOURCE error during TRANSFER_PREPARATION phase: [GENERAL_FAILURE] Protocol(s) specified not supported: [ gsiftp ]
5 SOURCE error during TRANSFER_PREPARATION phase: [REQUEST_TIMEOUT] failed to prepare source file in 180 seconds
*** ERRORS from T1_US_FNAL_Buffer:***
9 error during phase: []
7 TRANSFER error during TRANSFER phase: [GRIDFTP_ERROR] an end-of-file was reached globus_xio: An end of file occurred (possibly the destination disk is full)
5 TRANSFER error during TRANSFER phase: [GRIDFTP_ERROR] globus_ftp_client: the server responded with an error 426 Transfer aborted (Unexpected Exception : org.dcache.ftp.FTPException: Stream ended before EOD/Data channel was closed before EOD marker)
SITE STATISTICS:
==================
first entry: 2009-06-10 07:35:51 last entry: 2009-06-11 07:35:22
T1_DE_FZK_Buffer (OK: 295 Err: 3 Exp: 0 Canc: 0 Lost: 0) succ.: 99.0 % total: 311.5 GB ( 3.6 MB/s)
T1_ES_PIC_Buffer (OK: 348 Err: 1 Exp: 0 Canc: 0 Lost: 0) succ.: 99.7 % total: 331.3 GB ( 3.8 MB/s)
T1_FR_CCIN2P3_Buffer (OK: 100 Err: 60 Exp: 0 Canc: 0 Lost: 0) succ.: 62.5 % total: 94.9 GB ( 1.1 MB/s)
T1_US_FNAL_Buffer (OK: 1594 Err: 21 Exp: 0 Canc: 0 Lost: 0) succ.: 98.7 % total: 1814.1 GB (21.0 MB/s)
TOTAL SUMMARY:
==================
first entry: 2009-06-10 07:35:51 last entry: 2009-06-11 07:35:22
total transferred: 2376.5 GB in 24.0 hours
avg. total rate: 28.2 MB/s = 225.4 Mb/s = 2377.3 GB/day
Note: Measurement for first 12 hours showing high rates with top quality from FNAL.
Show Hide
SITE STATISTICS:
==================
first entry: 2009-06-09 19:00:32 last entry: 2009-06-10 07:00:32
T1_CH_CERN_Buffer (OK: 8 Err: 0 Exp: 0 Canc: 0 Lost: 0) succ.: 100.0 % total: 21.5 GB ( 0.5 MB/s)
T1_ES_PIC_Buffer (OK: 105 Err: 2 Exp: 0 Canc: 0 Lost: 0) succ.: 98.1 % total: 113.5 GB ( 2.6 MB/s)
T1_FR_CCIN2P3_Buffer (OK: 671 Err: 422 Exp: 0 Canc: 0 Lost: 0) succ.: 61.4 % total: 645.7 GB (14.9 MB/s)
T1_IT_CNAF_Buffer (OK: 6 Err: 0 Exp: 0 Canc: 0 Lost: 0) succ.: 100.0 % total: 16.1 GB ( 0.4 MB/s)
T1_TW_ASGC_Buffer (OK: 8 Err: 2 Exp: 0 Canc: 0 Lost: 0) succ.: 80.0 % total: 22.5 GB ( 0.5 MB/s)
T1_UK_RAL_Buffer (OK: 8 Err: 0 Exp: 0 Canc: 0 Lost: 0) succ.: 100.0 % total: 22.3 GB ( 0.5 MB/s)
T1_US_FNAL_Buffer (OK: 2412 Err: 15 Exp: 0 Canc: 0 Lost: 0) succ.: 99.4 % total: 2803.3 GB (64.9 MB/s)
T2_DE_RWTH (OK: 0 Err: 0 Exp: 37 Canc: 0 Lost: 0) total: 0.0 GB ( 0.0 MB/s)
TOTAL SUMMARY:
==================
first entry: 2009-06-09 19:00:32 last entry: 2009-06-10 07:00:32
total transferred: 3394.5 GB in 12.0 hours
avg. total rate: 80.5 MB/s = 643.7 Mb/s = 6789.1 GB/day
--
DerekFeichtinger - 25 May 2009
Go to
previous page /
next page of CMS site log