Go to
previous page /
next page of CMS site log
17. 09. 2009 lcg-cp stageout problems from CRAB jobs
NOTE: This problem was reported on
this hypernews item. The problem is tracked on
this Savannah support request. It also has been submitted to the dcache support list on 2009-09-18 as tracker item #5109.
Andrea Rizzi and Andreas Schaetti reported on stageout failures from their CRAB jobs.
The relevant part of the CRAB log output is
########## contents of SE interaction
2009-09-17 15:15:12.751466:
Executed: lcg-ls -b -D srmv2 -t 2400 --verbose srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/arizzi/W
H_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root
Done with exit code: 256
and output:
Warning: -t,--timeout is deprecated! Use --timeout-* options instead
/pnfs/lcg.cscs.ch/cms/trivcat/store/user/arizzi/WH_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN
_MC_2.root: [SE][Ls][SRM_INVALID_PATH] could not get storage info by path : CacheException(rc=10001;msg=path /pnfs/fs/usr/cms/trivcat/store/user/arizzi/WH
_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root not found ( .(id)(hwbbar115_10TeV_GEN_M
C_2.root) ))
SE type: SRMv2
2009-09-17 15:15:13.890772:
Executed: lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///home/egee/cms074/globus-tmp.wn36.5872.0/https_3a_2f_2fwms213.cern.ch_
3a9000_2fvgkrUkMs0YPpBCGY4QTPjg/CMSSW_3_1_2/hwbbar115_10TeV_GEN_MC_2.root srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat
/store/user/arizzi/WH_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root
Done with exit code: 256
and output:
Warning: -t,--timeout is deprecated! Use --timeout-* options instead
Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name: cms
Checksum type: None
Destination SE type: SRMv2
[SE][Mkdir][SRM_INVALID_PATH] srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/arizzi/WH_HTobb_Pt100_M115_GEN_v
2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root: srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.
cscs.ch/cms/trivcat/store/user/arizzi/WH_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b : parent path or a component
of the parent path does not exist
lcg_cp: No such file or directory
Andrea Rizzi's user directory exists, but none of the subdirectories does exist. It seems that
lcg-cp
does not create automatically all the required subdirectories for a request. The job seem to run fine at T2_IT_Pisa.
lcg-cp refuses to create more than one subdirectory layer at T2_CH_CSCS - this seems intentional!
lcg-cp (executed from CSCS-UI) with implicit creation of one subdirectory works, while implict creation of two directories fails. This behavior seems to be intentional, and dcache responds with a specific error message about not being able to create the nested directory, because the parent directory is not there.
I was able to confirm the path creation behavior in a few tests. Note that our site is running dcache-1.9.3-3 at the moment of these tests.
- First I confirm that the path
/pnfs/lcg.cscs.ch/cms/local_tests
exists
lcg-ls -b -D srmv2 --srm-timeout 2400 --verbose srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests
SE type: SRMv2
/pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm2b
/pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20081207-1239-8889-gftp
...
- Now I try to copy a file nested in two subdirectories to this directory, and this fails with the exact same error.
lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///tmp/dcachetest-20090917-1352-3942/srcfile srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/derekdir1/derekdir2/lcg-cp-derek1
Warning: -t,--timeout is deprecated! Use --timeout-* options instead
Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name: cms
Checksum type: None
Destination SE type: SRMv2
[SE][Mkdir][SRM_FAILURE] srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/derekdir1/derekdir2/lcg-cp-derek1: srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/derekdir1/derekdir2 Failed to create, got error return code from pnfs: path /pnfs/fs/usr/cms/local_tests/derekdir1/derekdir2 not found ( .(id)(derekdir2) )
lcg_cp: Invalid argument
- Now I try the same copy, but with only one subdirectory in the request, and this succeeds
lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///tmp/dcachetest-20090917-1352-3942/srcfile srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/derekdir1/lcg-cp-derek1
Warning: -t,--timeout is deprecated! Use --timeout-* options instead
Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name: cms
Checksum type: None
Destination SE type: SRMv2
Destination SRM Request Token: -2136239017
Source URL: file:/tmp/dcachetest-20090917-1352-3942/srcfile
File size: 51200
Source URL for copy: file:/tmp/dcachetest-20090917-1352-3942/srcfile
Destination URL: gsiftp://se16.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/derekdir1/lcg-cp-derek1
# streams: 1
51200 bytes 49.72 KB/sec avg 49.72 KB/sec inst
Transfer took 2020 ms
lcg-cp correctly creates multiple subdirectory layers at T2_IT_Pisa
Here I can confirm that the creation of two layers of subdirectories is working at T2_IT_Pisa. The lcg-cp is again executed from CSCS-UI, so any differences observed must be attributed to the SE.
- Creation of a test user directory for my username
srmmkdir srm://cmsdcache.pi.infn.it:8443/srm/managerv2?SFN=/pnfs/pi.infn.it/data/cms/store/user/dfeichti
- Transfer of a simple file
lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///tmp/dcachetest-20090917-1205-24206/srcfile srm://cmsdcache.pi.infn.it:8443/srm/managerv2?SFN=/pnfs/pi.infn.it/data/cms/store/user/dfeichti/lcg-cp-derek5
Warning: -t,--timeout is deprecated! Use --timeout-* options instead
Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name: cms
Checksum type: None
Destination SE type: SRMv2
Destination SRM Request Token: -2141283420
Source URL: file:/tmp/dcachetest-20090917-1205-24206/srcfile
File size: 51200
Source URL for copy: file:/tmp/dcachetest-20090917-1205-24206/srcfile
Destination URL: gsiftp://cmsdcache10.pi.infn.it:2811//pnfs/pi.infn.it/data/cms/store/user/dfeichti/lcg-cp-derek5
# streams: 1
51200 bytes 42.96 KB/sec avg 42.96 KB/sec inst
Transfer took 2060 ms
- Transfer of a file with creation of two directory layers
lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///tmp/dcachetest-20090917-1205-24206/srcfile srm://cmsdcache.pi.infn.it:8443/srm/managerv2?SFN=/pnfs/pi.infn.it/data/cms/store/user/dfeichti/subdir1/subdir2/lcg-cp-derek5
Warning: -t,--timeout is deprecated! Use --timeout-* options instead
Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name: cms
Checksum type: None
Destination SE type: SRMv2
Destination SRM Request Token: -2141283364
Source URL: file:/tmp/dcachetest-20090917-1205-24206/srcfile
File size: 51200
Source URL for copy: file:/tmp/dcachetest-20090917-1205-24206/srcfile
Destination URL: gsiftp://cmsdcache7.pi.infn.it:2811//pnfs/pi.infn.it/data/cms/store/user/dfeichti/subdir1/subdir2/lcg-cp-derek5
# streams: 1
51200 bytes 44.34 KB/sec avg 44.34 KB/sec inst
Transfer took 2060 ms
srmcp succeeds in creating nested subdirectories at CSCS
Contrary to lcg-cp, srmcp has no problem to create the implicit two sub directories
Executing from CSCS UI:
srmcp --debug -2 file:////tmp/dcachetest-20090917-1205-24206/srcfile srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/dfsub1/dfsub2/df1
WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed
WARNING: SRM_PATH=/opt/d-cache/srm
Storage Resource Manager (SRM) Client version 2.1.2
Copyright (c) 2002-2008 Fermi National Accelerator Laboratory
SRM Configuration:
default_port=8443
debug=true
...
...
execution of CopyJob, source = file:////tmp/dcachetest-20090917-1205-24206/srcfile destination = gsiftp://se25.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/dfsub1/dfsub2/df1 completed
SRMClientV2 : srmPutDone , contacting service httpg://storage01.lcg.cscs.ch:8443/srm/managerv2
srmPutDone status code=SRM_SUCCESS
copy_jobs is empty
stopping copier
Differences between T2_CH_CSCS and T2_IT_PISA
As noted above, the lcg-cp tests were all executed from CSCS-UI, so a difference in lcg-cp version cannot be responsible for the different behavior. My guess is either dcache version or dcache configuration.
|
T2_CH_CSCS |
T2_IT_PISA |
Storage Manager |
dcache-1.9.3 |
dcache-1.8.0-15p5 |
namespace |
pnfs |
pnfs |
lcg-util version |
1.7.6-1 |
1.7.4-1 |
GFAL-client |
1.11.8-1 |
1.11.6-2 |
dcache configuration?
On the T2_CH_CSCS dcache, the recursive directory creation is correctly enabled:
# ---- Enable automatic creation of directories.
#
# Allow automatic creation of directories via SRM
#
# allow=true, disallow=false
#
RecursiveDirectoryCreation=true
A look at the srm.batch file that sets the properties defaults, confirms
set context -c RecursiveDirectoryCreation true
The behavior at CSCS is inconsistent for lcg-cp (but not for srmcp)
It turns out that 2-layer directory creation sometimes succeeds at CSCS.
Therefore I used a small script to run a larger number of tests each against a few SEs.
All tests ran from the CSCS UI
lcg-cp 2-layer implicit directory creation
SE |
dcache version |
Failures |
CSCS |
1.9.3-3 |
9/20 |
PSI |
1.9.2-4 |
0/20 |
Pisa |
1.8.0-p15 |
0/20 |
1-layer directory creation always succeeds
Running the tests with srmcp against CSCS always succeeds
--
DerekFeichtinger - 2009-09-17
Go to
previous page /
next page of CMS site log