<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> %TOC% %ICON{arrowleft}% Go to [[CMSSiteLogXX][previous page]] / [[CMSSiteLogXX][next page]] of CMS site log %M% ---+ 17. 09. 2009 lcg-cp stageout problems from CRAB jobs *NOTE*: This problem was reported on [[https://hypernews.cern.ch/HyperNews/CMS/get/crabFeedback/2431.html][ this hypernews item]]. The problem is tracked on [[https://savannah.cern.ch/support/index.php?109984][this Savannah support request]]. It also has been submitted to the dcache support list on 2009-09-18 as tracker item #5109. Andrea Rizzi and Andreas Schaetti reported on stageout failures from their CRAB jobs. The relevant part of the CRAB log output is <pre %FILESTYLE%> ########## contents of SE interaction 2009-09-17 15:15:12.751466: Executed: lcg-ls -b -D srmv2 -t 2400 --verbose srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/arizzi/W H_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root Done with exit code: 256 and output: Warning: -t,--timeout is deprecated! Use --timeout-* options instead /pnfs/lcg.cscs.ch/cms/trivcat/store/user/arizzi/WH_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN _MC_2.root: [SE][Ls][SRM_INVALID_PATH] could not get storage info by path : CacheException(rc=10001;msg=path /pnfs/fs/usr/cms/trivcat/store/user/arizzi/WH _HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root not found ( .(id)(hwbbar115_10TeV_GEN_M C_2.root) )) SE type: SRMv2 2009-09-17 15:15:13.890772: Executed: lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///home/egee/cms074/globus-tmp.wn36.5872.0/https_3a_2f_2fwms213.cern.ch_ 3a9000_2fvgkrUkMs0YPpBCGY4QTPjg/CMSSW_3_1_2/hwbbar115_10TeV_GEN_MC_2.root srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat /store/user/arizzi/WH_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root Done with exit code: 256 and output: Warning: -t,--timeout is deprecated! Use --timeout-* options instead Using grid catalog type: UNKNOWN Using grid catalog : (null) VO name: cms Checksum type: None Destination SE type: SRMv2 [SE][Mkdir][SRM_INVALID_PATH] srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/arizzi/WH_HTobb_Pt100_M115_GEN_v 2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b/hwbbar115_10TeV_GEN_MC_2.root: srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg. cscs.ch/cms/trivcat/store/user/arizzi/WH_HTobb_Pt100_M115_GEN_v2/WH_HTobb_Pt100_M115_GEN_v2/3804f52f25a016d6eb88c4371b906f7b : %GREEN%parent path or a component of the parent path does not exist lcg_cp: No such file or directory%ENDCOLOR% </pre> Andrea Rizzi's user directory exists, but none of the subdirectories does exist. It seems that =lcg-cp= does not create automatically all the required subdirectories for a request. The job seem to run fine at T2_IT_Pisa. ---++ lcg-cp refuses to create more than one subdirectory layer at T2_CH_CSCS - this seems intentional! lcg-cp (executed from CSCS-UI) with implicit creation of one subdirectory works, while implict creation of two directories fails. This behavior seems to be intentional, and dcache responds with a specific error message about not being able to create the nested directory, because the parent directory is not there. I was able to confirm the path creation behavior in a few tests. Note that our site is running dcache-1.9.3-3 at the moment of these tests. * %Y% First I confirm that the path =/pnfs/lcg.cscs.ch/cms/local_tests= exists <pre> lcg-ls -b -D srmv2 --srm-timeout 2400 --verbose srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests SE type: SRMv2 /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm2b /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20081207-1239-8889-gftp ... </pre> * %ICON{"choice-no"}% Now I try to copy a file nested in two subdirectories to this directory, and this fails with the exact same error.<pre> lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///tmp/dcachetest-20090917-1352-3942/srcfile srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/%GREEN%derekdir1/derekdir2/lcg-cp-derek1%ENDCOLOR% Warning: -t,--timeout is deprecated! Use --timeout-* options instead Using grid catalog type: UNKNOWN Using grid catalog : (null) VO name: cms Checksum type: None Destination SE type: SRMv2 [SE][Mkdir][SRM_FAILURE] srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/derekdir1/derekdir2/lcg-cp-derek1: srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/derekdir1/derekdir2 Failed to create, got error return code from pnfs: path /pnfs/fs/usr/cms/local_tests/derekdir1/derekdir2 not found ( .(id)(derekdir2) ) lcg_cp: Invalid argument </pre> * %Y% Now I try the same copy, but with only one subdirectory in the request, and this succeeds<pre> lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///tmp/dcachetest-20090917-1352-3942/srcfile srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/%GREEN%derekdir1/lcg-cp-derek1%ENDCOLOR% Warning: -t,--timeout is deprecated! Use --timeout-* options instead Using grid catalog type: UNKNOWN Using grid catalog : (null) VO name: cms Checksum type: None Destination SE type: SRMv2 Destination SRM Request Token: -2136239017 Source URL: file:/tmp/dcachetest-20090917-1352-3942/srcfile File size: 51200 Source URL for copy: file:/tmp/dcachetest-20090917-1352-3942/srcfile Destination URL: gsiftp://se16.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/derekdir1/lcg-cp-derek1 # streams: 1 51200 bytes 49.72 KB/sec avg 49.72 KB/sec inst Transfer took 2020 ms </pre> ---++ lcg-cp correctly creates multiple subdirectory layers at T2_IT_Pisa Here I can confirm that the creation of two layers of subdirectories is working at T2_IT_Pisa. The lcg-cp is again executed from CSCS-UI, so any differences observed must be attributed to the SE. * %Y% Creation of a test user directory for my username<pre> srmmkdir srm://cmsdcache.pi.infn.it:8443/srm/managerv2?SFN=/pnfs/pi.infn.it/data/cms/store/user/dfeichti </pre> * %Y% Transfer of a simple file<pre> lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///tmp/dcachetest-20090917-1205-24206/srcfile srm://cmsdcache.pi.infn.it:8443/srm/managerv2?SFN=/pnfs/pi.infn.it/data/cms/store/user/dfeichti/lcg-cp-derek5 Warning: -t,--timeout is deprecated! Use --timeout-* options instead Using grid catalog type: UNKNOWN Using grid catalog : (null) VO name: cms Checksum type: None Destination SE type: SRMv2 Destination SRM Request Token: -2141283420 Source URL: file:/tmp/dcachetest-20090917-1205-24206/srcfile File size: 51200 Source URL for copy: file:/tmp/dcachetest-20090917-1205-24206/srcfile Destination URL: gsiftp://cmsdcache10.pi.infn.it:2811//pnfs/pi.infn.it/data/cms/store/user/dfeichti/lcg-cp-derek5 # streams: 1 51200 bytes 42.96 KB/sec avg 42.96 KB/sec inst Transfer took 2060 ms </pre> * %Y% Transfer of a file with creation of two directory layers <pre> lcg-cp --verbose --vo=cms -b -D srmv2 -t 2400 --verbose file:///tmp/dcachetest-20090917-1205-24206/srcfile srm://cmsdcache.pi.infn.it:8443/srm/managerv2?SFN=/pnfs/pi.infn.it/data/cms/store/user/dfeichti/subdir1/subdir2/lcg-cp-derek5 Warning: -t,--timeout is deprecated! Use --timeout-* options instead Using grid catalog type: UNKNOWN Using grid catalog : (null) VO name: cms Checksum type: None Destination SE type: SRMv2 Destination SRM Request Token: -2141283364 Source URL: file:/tmp/dcachetest-20090917-1205-24206/srcfile File size: 51200 Source URL for copy: file:/tmp/dcachetest-20090917-1205-24206/srcfile Destination URL: gsiftp://cmsdcache7.pi.infn.it:2811//pnfs/pi.infn.it/data/cms/store/user/dfeichti/subdir1/subdir2/lcg-cp-derek5 # streams: 1 51200 bytes 44.34 KB/sec avg 44.34 KB/sec inst Transfer took 2060 ms </pre> ---++ srmcp succeeds in creating nested subdirectories at CSCS Contrary to lcg-cp, srmcp has no problem to create the implicit two sub directories Executing from CSCS UI: <pre> srmcp --debug -2 file:////tmp/dcachetest-20090917-1205-24206/srcfile srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local_tests/%GREEN%dfsub1/dfsub2/df1%ENDCOLOR% WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed WARNING: SRM_PATH=/opt/d-cache/srm Storage Resource Manager (SRM) Client version 2.1.2 Copyright (c) 2002-2008 Fermi National Accelerator Laboratory SRM Configuration: default_port=8443 debug=true ... ... execution of CopyJob, source = file:////tmp/dcachetest-20090917-1205-24206/srcfile destination = gsiftp://se25.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/dfsub1/dfsub2/df1 completed SRMClientV2 : srmPutDone , contacting service httpg://storage01.lcg.cscs.ch:8443/srm/managerv2 srmPutDone status code=SRM_SUCCESS copy_jobs is empty stopping copier </pre> ---++ Differences between T2_CH_CSCS and T2_IT_PISA As noted above, the lcg-cp tests were all executed from CSCS-UI, so a difference in lcg-cp version cannot be responsible for the different behavior. My guess is either dcache version or dcache configuration. ||*T2_CH_CSCS*|*T2_IT_PISA*| |Storage Manager| dcache-1.9.3 | dcache-1.8.0-15p5 | |namespace| pnfs | pnfs | |lcg-util version| 1.7.6-1 | 1.7.4-1 | |GFAL-client| 1.11.8-1 | 1.11.6-2 | ---++ dcache configuration? On the T2_CH_CSCS dcache, the recursive directory creation is correctly enabled: <pre %FILESTYLE%> # ---- Enable automatic creation of directories. # # Allow automatic creation of directories via SRM # # allow=true, disallow=false # RecursiveDirectoryCreation=true </pre> A look at the srm.batch file that sets the properties defaults, confirms <pre %FILESTYLE%> set context -c RecursiveDirectoryCreation true </pre> ---++ The behavior at CSCS is inconsistent for lcg-cp (but not for srmcp) It turns out that 2-layer directory creation sometimes succeeds at CSCS. Therefore I used a small script to run a larger number of tests each against a few SEs. All tests ran from the CSCS UI %TABLE{caption="lcg-cp 2-layer implicit directory creation"}% |*SE*| *dcache version* | *namespace* |*Failures/Total tries* | |CSCS|1.9.3-3| pnfs | 9/20 | |Estonia| 1.9.3-3 | pnfs | 0/20 | |PSI| 1.9.2-4 | pnfs | 0/20 | |Pisa| 1.8.0-p15 | pnfs | 0/20 | *Estonia runs the exact same dcache version as we do, and they also still have pnfs. All tests I did on their site succeeded, so this points to some local problem at CSCS*. My suspicions are mostly targeted at the pnfs namespace... Still: The fact that lcg-cp and srmcp show such different behavior on our site is a bit unsettling. 1-layer directory creation always succeeds %TABLE{caption="lcg-cp 1-layer implicit directory creation"}% |*SE*| *dcache version* | *Failures* | |CSCS|1.9.3-3| 0/20 | Running the tests with srmcp against CSCS always succeeds %TABLE{caption="srmcp 2-layer implicit directory creation"}% |*SE*| *dcache version* | *Failures* | |CSCS|1.9.3-3| 0/20 | ---++ 04. 02. 2010 Problem solved after updates to dcache 1.9.x Running the test against the newer dcache versions at CSCS always shows successful runs. -- Main.DerekFeichtinger - 2009-09-17 ---------------- %ICON{arrowleft}% Go to [[CMSSiteLogXX][previous page]] / [[CMSSiteLogXX][next page]] of CMS site log %M%
This topic: LCGTier2
>
WebHome
>
CMSInfoPages
>
CMSSiteLog
>
CMSSiteLog17
Topic revision: r9 - 2010-02-04 - DerekFeichtinger
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback