Arrow left Go to previous page / next page of CMS site log MOVED TO...

17. 09. 2009 lcg-cp stageout problems from CRAB jobs

NOTE: This problem was reported on this hypernews item. The problem is tracked on this Savannah support request. It also has been submitted to the dcache support list on 2009-09-18 as tracker item #5109.

Andrea Rizzi and Andreas Schaetti reported on stageout failures from their CRAB jobs.

The relevant part of the CRAB log output is

########## contents of SE interaction
2009-09-17 15:15:12.751466:
Executed:       lcg-ls -b -D srmv2  -t 2400 --verbose srm://
Done with exit code:    256
and output:
Warning: -t,--timeout is deprecated! Use --timeout-* options instead
_MC_2.root: [SE][Ls][SRM_INVALID_PATH] could not get storage info by path : CacheException(rc=10001;msg=path /pnfs/fs/usr/cms/trivcat/store/user/arizzi/WH
_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root not found ( .(id)(hwbbar115_10TeV_GEN_M
C_2.root) ))
SE type: SRMv2

2009-09-17 15:15:13.890772:
Executed:        lcg-cp  --verbose  --vo=cms  -b -D srmv2  -t 2400 --verbose file:///home/egee/cms074/globus-tmp.wn36.5872.0/
3a9000_2fvgkrUkMs0YPpBCGY4QTPjg/CMSSW_3_1_2/hwbbar115_10TeV_GEN_MC_2.root srm://
Done with exit code:    256
and output:
Warning: -t,--timeout is deprecated! Use --timeout-* options instead
Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name: cms
Checksum type: None
Destination SE type: SRMv2
[SE][Mkdir][SRM_INVALID_PATH] srm://
2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root: srm:// : parent path or a component
of the parent path does not exist
lcg_cp: No such file or directory

Andrea Rizzi's user directory exists, but none of the subdirectories does exist. It seems that lcg-cp does not create automatically all the required subdirectories for a request. The job seem to run fine at T2_IT_Pisa.

lcg-cp refuses to create more than one subdirectory layer at T2_CH_CSCS - this seems intentional!

lcg-cp (executed from CSCS-UI) with implicit creation of one subdirectory works, while implict creation of two directories fails. This behavior seems to be intentional, and dcache responds with a specific error message about not being able to create the nested directory, because the parent directory is not there.

I was able to confirm the path creation behavior in a few tests. Note that our site is running dcache-1.9.3-3 at the moment of these tests.

  • DONE First I confirm that the path /pnfs/ exists
    lcg-ls -b -D srmv2 --srm-timeout 2400 --verbose srm://
    SE type: SRMv2
  • No Now I try to copy a file nested in two subdirectories to this directory, and this fails with the exact same error.
    lcg-cp  --verbose  --vo=cms  -b -D srmv2  -t 2400 --verbose file:///tmp/dcachetest-20090917-1352-3942/srcfile srm://
    Warning: -t,--timeout is deprecated! Use --timeout-* options instead
    Using grid catalog type: UNKNOWN
    Using grid catalog : (null)
    VO name: cms
    Checksum type: None
    Destination SE type: SRMv2
    [SE][Mkdir][SRM_FAILURE] srm:// srm:// Failed to create, got error return code from pnfs: path /pnfs/fs/usr/cms/local_tests/derekdir1/derekdir2 not found ( .(id)(derekdir2) )
    lcg_cp: Invalid argument
  • DONE Now I try the same copy, but with only one subdirectory in the request, and this succeeds
    lcg-cp  --verbose  --vo=cms  -b -D srmv2  -t 2400 --verbose file:///tmp/dcachetest-20090917-1352-3942/srcfile srm://
    Warning: -t,--timeout is deprecated! Use --timeout-* options instead
    Using grid catalog type: UNKNOWN
    Using grid catalog : (null)
    VO name: cms
    Checksum type: None
    Destination SE type: SRMv2
    Destination SRM Request Token: -2136239017
    Source URL: file:/tmp/dcachetest-20090917-1352-3942/srcfile
    File size: 51200
    Source URL for copy: file:/tmp/dcachetest-20090917-1352-3942/srcfile
    Destination URL: gsi
    # streams: 1
            51200 bytes     49.72 KB/sec avg     49.72 KB/sec inst
    Transfer took 2020 ms

lcg-cp correctly creates multiple subdirectory layers at T2_IT_Pisa

Here I can confirm that the creation of two layers of subdirectories is working at T2_IT_Pisa. The lcg-cp is again executed from CSCS-UI, so any differences observed must be attributed to the SE.

  • DONE Creation of a test user directory for my username
    srmmkdir srm://
  • DONE Transfer of a simple file
    lcg-cp --verbose --vo=cms -b -D srmv2  -t 2400 --verbose file:///tmp/dcachetest-20090917-1205-24206/srcfile   srm://
    Warning: -t,--timeout is deprecated! Use --timeout-* options instead
    Using grid catalog type: UNKNOWN
    Using grid catalog : (null)
    VO name: cms
    Checksum type: None
    Destination SE type: SRMv2
    Destination SRM Request Token: -2141283420
    Source URL: file:/tmp/dcachetest-20090917-1205-24206/srcfile
    File size: 51200
    Source URL for copy: file:/tmp/dcachetest-20090917-1205-24206/srcfile
    Destination URL: gsi
    # streams: 1
            51200 bytes     42.96 KB/sec avg     42.96 KB/sec inst
    Transfer took 2060 ms
  • DONE Transfer of a file with creation of two directory layers
    lcg-cp --verbose --vo=cms -b -D srmv2  -t 2400 --verbose file:///tmp/dcachetest-20090917-1205-24206/srcfile   srm://
    Warning: -t,--timeout is deprecated! Use --timeout-* options instead
    Using grid catalog type: UNKNOWN
    Using grid catalog : (null)
    VO name: cms
    Checksum type: None
    Destination SE type: SRMv2
    Destination SRM Request Token: -2141283364
    Source URL: file:/tmp/dcachetest-20090917-1205-24206/srcfile
    File size: 51200
    Source URL for copy: file:/tmp/dcachetest-20090917-1205-24206/srcfile
    Destination URL: gsi
    # streams: 1
            51200 bytes     44.34 KB/sec avg     44.34 KB/sec inst
    Transfer took 2060 ms

srmcp succeeds in creating nested subdirectories at CSCS

Contrary to lcg-cp, srmcp has no problem to create the implicit two sub directories

Executing from CSCS UI:

srmcp --debug -2 file:////tmp/dcachetest-20090917-1205-24206/srcfile srm://

WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed
WARNING: SRM_PATH=/opt/d-cache/srm
Storage Resource Manager (SRM) Client version 2.1.2
Copyright (c) 2002-2008 Fermi National Accelerator Laboratory

SRM Configuration:
execution of CopyJob, source = file:////tmp/dcachetest-20090917-1205-24206/srcfile destination = gsi completed
SRMClientV2 : srmPutDone , contacting service httpg://
srmPutDone status code=SRM_SUCCESS
copy_jobs is empty
stopping copier

Differences between T2_CH_CSCS and T2_IT_PISA

As noted above, the lcg-cp tests were all executed from CSCS-UI, so a difference in lcg-cp version cannot be responsible for the different behavior. My guess is either dcache version or dcache configuration.

Storage Manager dcache-1.9.3 dcache-1.8.0-15p5
namespace pnfs pnfs
lcg-util version 1.7.6-1 1.7.4-1
GFAL-client 1.11.8-1 1.11.6-2

dcache configuration?

On the T2_CH_CSCS dcache, the recursive directory creation is correctly enabled:

#  ---- Enable automatic creation of directories.
#  Allow automatic creation of directories via SRM
#  allow=true, disallow=false

A look at the srm.batch file that sets the properties defaults, confirms

set context -c RecursiveDirectoryCreation  true

The behavior at CSCS is inconsistent for lcg-cp (but not for srmcp)

It turns out that 2-layer directory creation sometimes succeeds at CSCS. Therefore I used a small script to run a larger number of tests each against a few SEs.

All tests ran from the CSCS UI

lcg-cp 2-layer implicit directory creation
SE dcache version namespaceSorted descending Failures/Total tries
CSCS 1.9.3-3 pnfs 9/20
Estonia 1.9.3-3 pnfs 0/20
PSI 1.9.2-4 pnfs 0/20
Pisa 1.8.0-p15 pnfs 0/20

Estonia runs the exact same dcache version as we do, and they also still have pnfs. All tests I did on their site succeeded, so this points to some local problem at CSCS. My suspicions are mostly targeted at the pnfs namespace... Still: The fact that lcg-cp and srmcp show such different behavior on our site is a bit unsettling.

1-layer directory creation always succeeds

lcg-cp 1-layer implicit directory creation
SE dcache version Failures
CSCS 1.9.3-3 0/20

Running the tests with srmcp against CSCS always succeeds

srmcp 2-layer implicit directory creation
SE dcache version Failures
CSCS 1.9.3-3 0/20

04. 02. 2010 Problem solved after updates to dcache 1.9.x

Running the test against the newer dcache versions at CSCS always shows successful runs.

-- DerekFeichtinger - 2009-09-17

Arrow left Go to previous page / next page of CMS site log MOVED TO...

This topic: LCGTier2 > WebHome > CMSInfoPages > CMSSiteLog > CMSSiteLog17
Topic revision: r9 - 2010-02-04 - DerekFeichtinger
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback