Tags:
create new tag
view all tags

IssueSrmWriteProblem

Symptoms

Summary: SRMv1 and v2 writes fail irregularly, often with "files exists" errors

Occurrences

At what times did this problem occur (used to estimate frequency):

We noticed the problem from 2008-09-04 on.

Observations

Basic symptoms

  • Writing with srmcp to the SE fails frequently, but not consistently.
  • Failing the CMS SAM tests for mc stageout from the CE three times in a row (around 18h) , raised my attention to this issue.
    • From 20h-24h the CMS SAM tests look fine. The normal SAM tests never saw a problem, but my manual tests still fail frequently around 24h. Here's the status for SAM tests for monte carlo stageout on 2008-09-05 09:00h (link to current display): cmsSAM-30624_65.png
    • Looking back at the monthly CMS SAM tests, one can see that there are intermittent failures of the CMS-ce-mc test, and these failures are clustered and occur frequently for 1-3 days, then they cease for a longer time.

srmcp -2 -debug file:////tmp/dcachetest-20080904-2021-8387/srcfile \
   srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm2d

Failure 1: the SRM transaction is OK, but then the GridFTP transfer is interrupted abruptly. Show command output Hide

WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed
WARNING: SRM_PATH=/opt/d-cache/srm
Storage Resource Manager (SRM) implementation version 2.0.6
Copyright (c) 2002-2008 Fermi National Accelerator Laboratory
Specification Version 2.0 by SRM Working Group (http://sdm.lbl.gov/srm-wg)
SRM Configuration:
        debug=true
        gsissl=true
        help=false
        pushmode=false
        userproxy=true
        buffer_size=131072
        tcp_buffer_size=0
        streams_num=10
        config_file=config.xml
        glue_mapfile=conf/SRMServerV1.map
        webservice_path=srm/managerv2
        webservice_protocol=https
        gsiftpclinet=globus-url-copy
        protocols_list=http,gsiftp
        save_config_file=null
        srmcphome=..
        urlcopy=sbin/urlcopy.sh
        x509_user_cert=/home/timur/k5-ca-proxy.pem
        x509_user_key=/home/timur/k5-ca-proxy.pem
        x509_user_proxy=/tmp/x509up_u20346
        x509_user_trusted_certificates=/etc/grid-security/certificates
        globus_tcp_port_range=null
        gss_expected_name=null
        storagetype=permanent
        retry_num=20
        retry_timeout=10000
        wsdl_url=null
        use_urlcopy_script=false
        connect_to_wsdl=false
        delegate=true
        full_delegation=true
        server_mode=passive
        srm_protocol_version=2
        request_lifetime=86400
        access latency=null
        overwrite mode=null
        priority=0
        from[0]=file:////tmp/dcachetest-20080904-2021-8387/srcfile
        to=srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm2d

Thu Sep 04 20:57:27 CEST 2008: starting SRMPutClient
Thu Sep 04 20:57:27 CEST 2008: In SRMClient ExpectedName: host
Thu Sep 04 20:57:27 CEST 2008: SRMClient(https,srm/managerv2,true)
SRMClientV2 : user credentials are: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger
SRMClientV2 : connecting to srm at httpg://storage01.lcg.cscs.ch:8443/srm/managerv2
SRMClientV2 : srmPrepareToPut , contacting service httpg://storage01.lcg.cscs.ch:8443/srm/managerv2
copy_jobs is empty
Thu Sep 04 20:57:30 CEST 2008:  srm returned requestToken = -2144591646
Thu Sep 04 20:57:30 CEST 2008: sleeping 1 seconds ...
SRMClientV2 : srmStatusOfPutRequest , contacting service httpg://storage01.lcg.cscs.ch:8443/srm/managerv2
Thu Sep 04 20:57:31 CEST 2008: no more pending transfers, breaking the loop
copy_jobs is not empty
copying CopyJob, source = file:////tmp/dcachetest-20080904-2021-8387/srcfile destination = gsiftp://se01.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm2d
GridftpClient: memory buffer size is set to 131072
GridftpClient: connecting to se01.lcg.cscs.ch on port 2811
GridftpClient: gridFTPClient tcp buffer size is set to 1048576
GridftpClient: gridFTPWrite started, source file is java.io.RandomAccessFile@5ddb6e destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm2d
GridftpClient: gridFTPWrite started, destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm2d
GridftpClient: set local data channel authentication mode to None
GridftpClient: parallelism: 10
GridftpClient: adler32 for file java.io.RandomAccessFile@5ddb6e is 5f2c7205
GridftpClient: waiting for completion of transfer
GridftpClient: starting a transfer to /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm2d
GridftpClient: DiskDataSink.close() called
GridftpClient: gridFTPWrite() wrote 51200bytes
GridftpClient: closing client : org.dcache.srm.util.GridftpClient$FnalGridFTPClient@ba4211
copy failed with the error
java.io.IOException:

        at org.globus.ftp.extended.GridFTPInputStream.readMsg(GridFTPInputStream.java:130)
        at org.globus.gsi.gssapi.net.GssInputStream.hasData(GssInputStream.java:86)
        at org.globus.gsi.gssapi.net.GssInputStream.read(GssInputStream.java:60)
        at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
        at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
        at java.io.InputStreamReader.read(InputStreamReader.java:167)
        at java.io.BufferedReader.fill(BufferedReader.java:136)
        at java.io.BufferedReader.readLine(BufferedReader.java:299)
        at java.io.BufferedReader.readLine(BufferedReader.java:362)
        at org.globus.ftp.vanilla.Reply.<init>(Reply.java:71)
        at org.globus.ftp.vanilla.FTPControlChannel.read(FTPControlChannel.java:262)
        at org.globus.ftp.vanilla.FTPControlChannel.exchange(FTPControlChannel.java:305)
        at org.globus.ftp.vanilla.FTPControlChannel.execute(FTPControlChannel.java:330)
        at org.globus.ftp.FTPClient.close(FTPClient.java:921)
        at org.dcache.srm.util.GridftpClient.close(GridftpClient.java:651)
        at gov.fnal.srm.util.Copier.javaGridFtpCopy(Copier.java:626)
        at gov.fnal.srm.util.Copier.copy(Copier.java:495)
        at gov.fnal.srm.util.Copier.run(Copier.java:321)
        at java.lang.Thread.run(Thread.java:595)
 try again
sleeping for 10000 before retrying
SRMClientV2 : srmAbortFiles , contacting service httpg://storage01.lcg.cscs.ch:8443/srm/managerv2
srmAbortFiles status code=SRM_SUCCESS
  • Server-side: the PNFS entry is created and given the correct metadata (for instance, the size is the size of the file to be transferred); however, it is not deleted when the transfer is interrupted (possibly because of the "permission denied" error; see below). This might be the cause for subsequent failures to write to the same destination (see "Failure 2" below)
  • dCache server logs with error: This is the full dump of the srm-storage01Domain.log lines
corresponding to a failed request (of the kind dubbed Failure 1 above):: Show SRM log Hide
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : Message  arrived:  from [>SRM-storage01@srm-storage01Domain]
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : reserveSpace(group=/dech, role=NULL, sz=4446, latency=ONLINE, policy=OUTPUT, lifetime=14399776, description=null
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : policy is OUTPUT, needHsmBackup is false
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : findLinkGroupIds(sizeInBytes=4446, voGroup=/dech voRole=NULL, AccessLatency=ONLINE, RetentionPolicy=OUTPUT)
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT lg.*,lg.freespaceinbytes-lg.reservedspaceinbytes as available 
 from srmlinkgroup lg, srmlinkgroupvos lgvo
 where lg.id=lgvo.linkGroupId  and  lg.lastUpdateTime >= ?  and lg.onlineallowed = 1  and lg.outputallowed = 1  and  ( lgvo.VOGroup = ? OR lgvo.VOGroup = '*' )  and  ( lgvo.VORole = ? OR lgvo.VORole = '*' )  and  lg.freespaceinbytes-lg.reservedspaceinbytes >= ?  order by available desc ?=1221207119579?=/dech?=NULL?=4446
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : reserveSpaceInLinkGroup(linkGroupId=1group=/dech, role=NULL, sz=4446, latency=ONLINE, policy=OUTPUT, lifetime=14399776, description=null
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) :  return nextLong=1947322
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: INSERT INTO srmspace (id,vogroup,vorole,retentionpolicy,accesslatency,linkgroupid,sizeinbytes,creationtime,lifetime,description,state,usedspaceinbytes,allocatedspaceinbytes) VALUES  (?,?,?,?,?,?,?,?,?,?,?,?,?)
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : Sending reply 
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : Message  arrived:  from [>SRM-storage01@srm-storage01Domain]
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : useSpace()
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) :  return nextLong=1947323
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT * FROM srmspace WHERE  id = ? AND sizeinbytes-allocatedspaceinbytes >= ? FOR UPDATE ,?=1947322,4446
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: INSERT INTO srmspacefile (id,vogroup,vorole,spacereservationid,sizeinbytes,creationtime,lifetime,pnfspath,pnfsid,state)  VALUES  (?,?,?,?,?,?,?,?,NULL,?)
09/12 10:12:23 Cell(SrmSpaceManager@srm-storage01Domain) : Sending reply 
09/12 10:12:24 Cell(SrmSpaceManager@srm-storage01Domain) : Message  arrived: PnfsId=0006000000000000000E6B68;StorageInfo=size=0;new=true;stored=false;sClass=dech:dech;cClass=-;hsm=osm;accessLatency=ONLINE;retentionPolicy=OUTPUT;StoreName=dech;path=/pnfs/lcg.cscs.ch/dech/rm-test-1;store=dech;group=dech;bfid=<Unknown>;;PoolName=<unknown> from [>GFTP-se06-Unknown-9268@gridftp-se06Domain:*@gridftp-se06Domain:*@dCacheDomain]
09/12 10:12:24 Cell(SrmSpaceManager@srm-storage01Domain) : selectPool(PnfsId=0006000000000000000E6B68;StorageInfo=size=0;new=true;stored=false;sClass=dech:dech;cClass=-;hsm=osm;accessLatency=ONLINE;retentionPolicy=OUTPUT;StoreName=dech;path=/pnfs/lcg.cscs.ch/dech/rm-test-1;store=dech;group=dech;bfid=<Unknown>;;PoolName=<unknown>)
09/12 10:12:24 Cell(SrmSpaceManager@srm-storage01Domain) : selectPool: getFile(/pnfs/lcg.cscs.ch/dech/rm-test-1)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : java.sql.SQLException: found two records with pnfsPath=/pnfs/lcg.cscs.ch/dech/rm-test-1
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.getFile(ManagerV2.java:2039)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.selectPool(ManagerV2.java:3313)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.processMessage(ManagerV2.java:2276)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.access$000(ManagerV2.java:101)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2$1.run(ManagerV2.java:2244)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.lang.Thread.run(Thread.java:619)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : protocol info is GridProtocolInfo
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :  voinfo=/dech:NULL
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : selectPool: file is not found, no prior reservations for this file, calling reserveAndUseSpace()
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : reserveSpace(group=/dech, role=NULL, sz=0, latency=ONLINE, policy=OUTPUT, lifetime=3600000, description=null
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : policy is OUTPUT, needHsmBackup is false
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : findLinkGroupIds(sizeInBytes=0, voGroup=/dech voRole=NULL, AccessLatency=ONLINE, RetentionPolicy=OUTPUT)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT lg.*,lg.freespaceinbytes-lg.reservedspaceinbytes as available 
 from srmlinkgroup lg, srmlinkgroupvos lgvo
 where lg.id=lgvo.linkGroupId  and  lg.lastUpdateTime >= ?  and lg.onlineallowed = 1  and lg.outputallowed = 1  and  ( lgvo.VOGroup = ? OR lgvo.VOGroup = '*' )  and  ( lgvo.VORole = ? OR lgvo.VORole = '*' )  and  lg.freespaceinbytes-lg.reservedspaceinbytes >= ?  order by available desc ?=1221207119579?=/dech?=NULL?=0
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : reserveSpaceInLinkGroup(linkGroupId=1group=/dech, role=NULL, sz=0, latency=ONLINE, policy=OUTPUT, lifetime=3600000, description=null
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :  return nextLong=1947324
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: INSERT INTO srmspace (id,vogroup,vorole,retentionpolicy,accesslatency,linkgroupid,sizeinbytes,creationtime,lifetime,description,state,usedspaceinbytes,allocatedspaceinbytes) VALUES  (?,?,?,?,?,?,?,?,?,?,?,?,?)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :  return nextLong=1947325
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT * FROM srmspace WHERE  id = ? AND sizeinbytes-allocatedspaceinbytes >= ? FOR UPDATE ,?=1947324,0
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: INSERT INTO srmspacefile (id,vogroup,vorole,spacereservationid,sizeinbytes,creationtime,lifetime,pnfspath,pnfsid,state)  VALUES  (?,?,?,?,?,?,?,?,?,?)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : Executing: SELECT * FROM srmspace where id=?,?=1947324
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : selectPool: found linkGroup = dech-linkGroup, forwarding message
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : reply is not required, finished processing
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : messageToForward,  arrived: type=diskCacheV111.vehicles.PoolMgrSelectWritePoolMsg value=PnfsId=0006000000000000000E6B68;StorageInfo=size=0;new=true;stored=false;sClass=dech:dech;cClass=-;hsm=osm;accessLatency=ONLINE;retentionPolicy=OUTPUT;StoreName=dech;path=/pnfs/lcg.cscs.ch/dech/rm-test-1;SpaceToken=1947324;store=dech;group=dech;bfid=<Unknown>;;PoolName=se10_ops from [>PoolManager@dCacheDomain:*@dCacheDomain] going to [*@srm-storage01Domain:>SrmSpaceManager@srm-storage01Domain:*@dCacheDomain:*@gridftp-se06Domain:GFTP-se06-Unknown-9268@gridftp-se06Domain]false
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : selectPool(PnfsId=0006000000000000000E6B68;StorageInfo=size=0;new=true;stored=false;sClass=dech:dech;cClass=-;hsm=osm;accessLatency=ONLINE;retentionPolicy=OUTPUT;StoreName=dech;path=/pnfs/lcg.cscs.ch/dech/rm-test-1;SpaceToken=1947324;store=dech;group=dech;bfid=<Unknown>;;PoolName=se10_ops)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : selectPool: getFile(/pnfs/lcg.cscs.ch/dech/rm-test-1)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : java.sql.SQLException: found two records with pnfsPath=/pnfs/lcg.cscs.ch/dech/rm-test-1
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.getFile(ManagerV2.java:2039)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.selectPool(ManagerV2.java:3313)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.processMessageToForward(ManagerV2.java:2351)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2$2.run(ManagerV2.java:2336)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.lang.Thread.run(Thread.java:619)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : protocol info is GridProtocolInfo
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :  voinfo=/dech:NULL
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : selectPool: file is not found, no prior reservations for this file, calling reserveAndUseSpace()
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : reserveSpace(group=/dech, role=NULL, sz=0, latency=ONLINE, policy=OUTPUT, lifetime=3600000, description=null
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : policy is OUTPUT, needHsmBackup is false
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : findLinkGroupIds(sizeInBytes=0, voGroup=/dech voRole=NULL, AccessLatency=ONLINE, RetentionPolicy=OUTPUT)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT lg.*,lg.freespaceinbytes-lg.reservedspaceinbytes as available 
 from srmlinkgroup lg, srmlinkgroupvos lgvo
 where lg.id=lgvo.linkGroupId  and  lg.lastUpdateTime >= ?  and lg.onlineallowed = 1  and lg.outputallowed = 1  and  ( lgvo.VOGroup = ? OR lgvo.VOGroup = '*' )  and  ( lgvo.VORole = ? OR lgvo.VORole = '*' )  and  lg.freespaceinbytes-lg.reservedspaceinbytes >= ?  order by available desc ?=1221207119579?=/dech?=NULL?=0
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : reserveSpaceInLinkGroup(linkGroupId=1group=/dech, role=NULL, sz=0, latency=ONLINE, policy=OUTPUT, lifetime=3600000, description=null
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :  return nextLong=1947326
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: INSERT INTO srmspace (id,vogroup,vorole,retentionpolicy,accesslatency,linkgroupid,sizeinbytes,creationtime,lifetime,description,state,usedspaceinbytes,allocatedspaceinbytes) VALUES  (?,?,?,?,?,?,?,?,?,?,?,?,?)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) :  return nextLong=1947327
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT * FROM srmspace WHERE  id = ? AND sizeinbytes-allocatedspaceinbytes >= ? FOR UPDATE ,?=1947326,0
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: INSERT INTO srmspacefile (id,vogroup,vorole,spacereservationid,sizeinbytes,creationtime,lifetime,pnfspath,pnfsid,state)  VALUES  (?,?,?,?,?,?,?,?,?,?)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : Executing: SELECT * FROM srmspace where id=?,?=1947326
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : messageToForward,  arrived: type=diskCacheV111.vehicles.PoolAcceptFileMessage value=Pool=se10_ops from [>GFTP-se06-Unknown-9268@gridftp-se06Domain:*@gridftp-se06Domain:*@dCacheDomain] going to [>SrmSpaceManager@local:se10_ops@local]false
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : transferToBeStarted(0006000000000000000E6B68)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : transferToBeStarted(): could not get space reservation related to this transfer 
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : messageToForward,  arrived: type=diskCacheV111.vehicles.PoolAcceptFileMessage value=Pool=se10_ops from [>se10_ops@se10Domain:*@se10Domain:*@dCacheDomain] going to [*@dCacheDomain:*@srm-storage01Domain:>SrmSpaceManager@srm-storage01Domain:*@dCacheDomain:*@gridftp-se06Domain:GFTP-se06-Unknown-9268@gridftp-se06Domain]false
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : transferStarted(0006000000000000000E6B68,true)
09/12 10:12:25 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT * FROM srmspacefile WHERE  pnfsid=?   FOR UPDATE ,?=0006000000000000000E6B68
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) : transferStarted failed with 
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) : java.sql.SQLException: Multiple records found, disallowed
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.util.IoPackage.selectForUpdate(IoPackage.java:37)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.util.DBManager.selectForUpdate(DBManager.java:86)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.selectFileForUpdate(ManagerV2.java:1276)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.transferStarted(ManagerV2.java:2837)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.processMessageToForward(ManagerV2.java:2358)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2$2.run(ManagerV2.java:2336)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.lang.Thread.run(Thread.java:619)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) : messageToForward,  arrived: type=diskCacheV111.vehicles.DoorTransferFinishedMessage value= from [>se10_ops@se10Domain:*@se10Domain:*@dCacheDomain] going to [*@dCacheDomain:*@srm-storage01Domain:>SrmSpaceManager@srm-storage01Domain:*@dCacheDomain:*@gridftp-se06Domain:GFTP-se06-Unknown-9268@gridftp-se06Domain]false
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) : transferFinished(0006000000000000000E6B68,true)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT * FROM srmspacefile WHERE  pnfsid=?   FOR UPDATE ,?=0006000000000000000E6B68
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) : transferStarted failed with 
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) : java.sql.SQLException: Multiple records found, disallowed
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.util.IoPackage.selectForUpdate(IoPackage.java:37)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.util.DBManager.selectForUpdate(DBManager.java:86)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.selectFileForUpdate(ManagerV2.java:1276)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.transferFinished(ManagerV2.java:2895)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.processMessageToForward(ManagerV2.java:2368)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2$2.run(ManagerV2.java:2336)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
09/12 10:12:26 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.lang.Thread.run(Thread.java:619)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : Message  arrived:  from [>SRM-storage01@srm-storage01Domain]
09/12 10:12:27 Cell(SRM-storage01@srm-storage01Domain) : PutRequestHandler error: copy request state changed to Failed
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : cancelUseSpace()
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : Message  arrived:  from [>SRM-storage01@srm-storage01Domain]
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : releaseSpace()
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT * FROM srmspacefile WHERE  pnfspath=? FOR UPDATE ,?=/pnfs/lcg.cscs.ch/dech/rm-test-1
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT * FROM srmspace WHERE  id = ? FOR UPDATE ,?=1947322
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: SELECT * FROM srmlinkgroup WHERE  id = ? FOR UPDATE,?=1
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : executing statement: UPDATE srmspace set vogroup=?,vorole=?,retentionpolicy=?,accesslatency=?,linkgroupid=?,sizeinbytes=?, creationtime=?,lifetime=?,description=?,state=? where id=?,?=1947322
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : Sending reply 
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : cancelUseSpace failed with 
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : java.sql.SQLException: Multiple records found, disallowed
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.util.IoPackage.selectForUpdate(IoPackage.java:37)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.util.DBManager.selectForUpdate(DBManager.java:86)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.selectFileForUpdate(ManagerV2.java:1267)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.cancelUseSpace(ManagerV2.java:3108)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.processMessage(ManagerV2.java:2273)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.access$000(ManagerV2.java:101)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2$1.run(ManagerV2.java:2244)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.lang.Thread.run(Thread.java:619)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : java.sql.SQLException: Multiple records found, disallowed
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.util.IoPackage.selectForUpdate(IoPackage.java:37)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.util.DBManager.selectForUpdate(DBManager.java:86)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.selectFileForUpdate(ManagerV2.java:1267)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.cancelUseSpace(ManagerV2.java:3108)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.processMessage(ManagerV2.java:2273)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2.access$000(ManagerV2.java:101)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at diskCacheV111.services.space.ManagerV2$1.run(ManagerV2.java:2244)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) :    at java.lang.Thread.run(Thread.java:619)
09/12 10:12:27 Cell(SrmSpaceManager@srm-storage01Domain) : Sending reply (-1)=java.sql.SQLException: Multiple records found, disallowed
09/12 10:12:27 Cell(SRM-storage01@srm-storage01Domain) :  SrmUnmarkSpaceAsBeingUsedCompanion : Unmarking Space as Being Used Failed message.getReturnCode () != 0
09/12 10:12:27 Cell(SRM-storage01@srm-storage01Domain) : PutFileRequest #: PutCancelUseOfSpaceCallbacks error: Unmarking Space as Being Used failed: SpaceManagerReserveSpaceMessage.getReturnCode () != 0 =>java.sql.SQLException: Multiple records found, disallowed

Failure 2 (from another command): Show command output Hide

TEST: SRMv1-write
WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed
WARNING: SRM_PATH=/opt/d-cache/srm
Storage Resource Manager (SRM) implementation version 2.0.6
Copyright (c) 2002-2008 Fermi National Accelerator Laboratory
Specification Version 2.0 by SRM Working Group (http://sdm.lbl.gov/srm-wg)
SRM Configuration:
        debug=true
        gsissl=true
        help=false
        pushmode=false
        userproxy=true
        buffer_size=131072
        tcp_buffer_size=0
        streams_num=10
        config_file=config.xml
        glue_mapfile=conf/SRMServerV1.map
        webservice_path=srm/managerv1
        webservice_protocol=https
        gsiftpclinet=globus-url-copy
        protocols_list=http,gsiftp
        save_config_file=null
        srmcphome=..
        urlcopy=sbin/urlcopy.sh
        x509_user_cert=/home/timur/k5-ca-proxy.pem
        x509_user_key=/home/timur/k5-ca-proxy.pem
        x509_user_proxy=/tmp/x509up_u20346
        x509_user_trusted_certificates=/etc/grid-security/certificates
        globus_tcp_port_range=null
        gss_expected_name=null
        storagetype=permanent
        retry_num=20
        retry_timeout=10000
        wsdl_url=null
        use_urlcopy_script=false
        connect_to_wsdl=false
        delegate=true
        full_delegation=true
        server_mode=passive
        srm_protocol_version=1
        request_lifetime=86400
        access latency=null
        overwrite mode=null
        priority=0
        from[0]=file:////tmp/dcachetest-20080904-2021-8387/srcfile
        to=srm://storage01.lcg.cscs.ch:8443/srm/managerv1?SFN=//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1

Thu Sep 04 20:21:15 CEST 2008: starting SRMPutClient
Thu Sep 04 20:21:15 CEST 2008: In SRMClient ExpectedName: host
Thu Sep 04 20:21:15 CEST 2008: SRMClient(https,srm/managerv1,true)
SRMClientV1 : user credentials are: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger
SRMClientV1 : SRMClientV1 calling org.globus.axis.util.Util.registerTransport()
SRMClientV1 : connecting to srm at httpg://storage01.lcg.cscs.ch:8443/srm/managerv1
Thu Sep 04 20:21:16 CEST 2008: connected to server, obtaining proxy
Thu Sep 04 20:21:16 CEST 2008: got proxy of type class org.dcache.srm.client.SRMClientV1
Thu Sep 04 20:21:16 CEST 2008: source file#0 : /tmp/dcachetest-20080904-2021-8387/srcfile
SRMClientV1 :   put, sources[0]="srm://storage01.lcg.cscs.ch:8443/srm/managerv1?SFN=//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1"
SRMClientV1 :   put, dests[0]="srm://storage01.lcg.cscs.ch:8443/srm/managerv1?SFN=//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1"
SRMClientV1 :   put, protocols[0]="gsiftp"
SRMClientV1 :   put, protocols[1]="dcap"
SRMClientV1 :   put, protocols[2]="http"
SRMClientV1 :  put, contacting service httpg://storage01.lcg.cscs.ch:8443/srm/managerv1
copy_jobs is empty
Thu Sep 04 20:21:18 CEST 2008:  srm returned requestId = -2144592417
Thu Sep 04 20:21:18 CEST 2008: sleeping 4 seconds ...
Thu Sep 04 20:21:22 CEST 2008: FileRequestStatus with SURL=srm://storage01.lcg.cscs.ch:8443/srm/managerv1?SFN=//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1 is Ready
Thu Sep 04 20:21:22 CEST 2008:        received TURL=gsiftp://se06.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
copy_jobs is not empty
copying CopyJob, source = file:////tmp/dcachetest-20080904-2021-8387/srcfile destination = gsiftp://se06.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: memory buffer size is set to 131072
GridftpClient: connecting to se06.lcg.cscs.ch on port 2811
GridftpClient: gridFTPClient tcp buffer size is set to 1048576
GridftpClient: gridFTPWrite started, source file is java.io.RandomAccessFile@1d7fbfb destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: gridFTPWrite started, destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: set local data channel authentication mode to None
GridftpClient: parallelism: 10
GridftpClient: adler32 for file java.io.RandomAccessFile@1d7fbfb is 5f2c7205
GridftpClient: waiting for completion of transfer
GridftpClient: starting a transfer to /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: DiskDataSink.close() called
GridftpClient: gridFTPWrite() wrote 51200bytes
GridftpClient: closing client : org.dcache.srm.util.GridftpClient$FnalGridFTPClient@b33d0a
copy failed with the error
java.io.IOException:

        at org.globus.ftp.extended.GridFTPInputStream.readMsg(GridFTPInputStream.java:130)
        at org.globus.gsi.gssapi.net.GssInputStream.hasData(GssInputStream.java:86)
        at org.globus.gsi.gssapi.net.GssInputStream.read(GssInputStream.java:60)
        at sun.nio.cs.StreamDecoder$CharsetSD.readBytes(StreamDecoder.java:411)
        at sun.nio.cs.StreamDecoder$CharsetSD.implRead(StreamDecoder.java:453)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:183)
        at java.io.InputStreamReader.read(InputStreamReader.java:167)
        at java.io.BufferedReader.fill(BufferedReader.java:136)
        at java.io.BufferedReader.readLine(BufferedReader.java:299)
        at java.io.BufferedReader.readLine(BufferedReader.java:362)
        at org.globus.ftp.vanilla.Reply.<init>(Reply.java:71)
        at org.globus.ftp.vanilla.FTPControlChannel.read(FTPControlChannel.java:262)
        at org.globus.ftp.vanilla.FTPControlChannel.exchange(FTPControlChannel.java:305)
        at org.globus.ftp.vanilla.FTPControlChannel.execute(FTPControlChannel.java:330)
        at org.globus.ftp.FTPClient.close(FTPClient.java:921)
        at org.dcache.srm.util.GridftpClient.close(GridftpClient.java:651)
        at gov.fnal.srm.util.Copier.javaGridFtpCopy(Copier.java:626)
        at gov.fnal.srm.util.Copier.copy(Copier.java:495)
        at gov.fnal.srm.util.Copier.run(Copier.java:321)
        at java.lang.Thread.run(Thread.java:595)
 try again
sleeping for 10000 before retrying
copying CopyJob, source = file:////tmp/dcachetest-20080904-2021-8387/srcfile destination = gsiftp://se06.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: memory buffer size is set to 131072
GridftpClient: connecting to se06.lcg.cscs.ch on port 2811
GridftpClient: gridFTPClient tcp buffer size is set to 1048576
GridftpClient: gridFTPWrite started, source file is java.io.RandomAccessFile@1ce784b destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: gridFTPWrite started, destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: set local data channel authentication mode to None
GridftpClient: parallelism: 10
GridftpClient: adler32 for file java.io.RandomAccessFile@1ce784b is 5f2c7205
GridftpClient: waiting for completion of transfer
GridftpClient: starting a transfer to /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1: Cannot create file: File exists (error code 1)
        at org.globus.ftp.FTPClient.get127Reply(FTPClient.java:1617)
        at org.globus.ftp.FTPClient.put2(FTPClient.java:1857)
        at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:956)
        at java.lang.Thread.run(Thread.java:595)
GridftpClient:  transfer exception
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1: Cannot create file: File exists (error code 1)
        at org.globus.ftp.FTPClient.get127Reply(FTPClient.java:1617)
        at org.globus.ftp.FTPClient.put2(FTPClient.java:1857)
        at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:956)
        at java.lang.Thread.run(Thread.java:595)
GridftpClient: closing client : org.dcache.srm.util.GridftpClient$FnalGridFTPClient@3bc473
GridftpClient: closed client
copy failed with the error
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1: Cannot create file: File exists (error code 1)
        at org.globus.ftp.FTPClient.get127Reply(FTPClient.java:1617)
        at org.globus.ftp.FTPClient.put2(FTPClient.java:1857)
        at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:956)
        at java.lang.Thread.run(Thread.java:595)
 try again
sleeping for 20000 before retrying
copying CopyJob, source = file:////tmp/dcachetest-20080904-2021-8387/srcfile destination = gsiftp://se06.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: memory buffer size is set to 131072
GridftpClient: connecting to se06.lcg.cscs.ch on port 2811
GridftpClient: gridFTPClient tcp buffer size is set to 1048576
GridftpClient: gridFTPWrite started, source file is java.io.RandomAccessFile@384065 destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: gridFTPWrite started, destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: set local data channel authentication mode to None
GridftpClient: parallelism: 10
GridftpClient: adler32 for file java.io.RandomAccessFile@384065 is 5f2c7205
GridftpClient: waiting for completion of transfer
GridftpClient: starting a transfer to /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1: Cannot create file: File exists (error code 1)
        at org.globus.ftp.FTPClient.get127Reply(FTPClient.java:1617)
        at org.globus.ftp.FTPClient.put2(FTPClient.java:1857)
        at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:956)
        at java.lang.Thread.run(Thread.java:595)
GridftpClient:  transfer exception
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1: Cannot create file: File exists (error code 1)
        at org.globus.ftp.FTPClient.get127Reply(FTPClient.java:1617)
        at org.globus.ftp.FTPClient.put2(FTPClient.java:1857)
        at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:956)
        at java.lang.Thread.run(Thread.java:595)
GridftpClient: closing client : org.dcache.srm.util.GridftpClient$FnalGridFTPClient@513d61
GridftpClient: closed client
copy failed with the error
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1: Cannot create file: File exists (error code 1)
        at org.globus.ftp.FTPClient.get127Reply(FTPClient.java:1617)
        at org.globus.ftp.FTPClient.put2(FTPClient.java:1857)
        at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:956)
        at java.lang.Thread.run(Thread.java:595)
 try again
sleeping for 30000 before retrying
copying CopyJob, source = file:////tmp/dcachetest-20080904-2021-8387/srcfile destination = gsiftp://se06.lcg.cscs.ch:2811//pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: memory buffer size is set to 131072
GridftpClient: connecting to se06.lcg.cscs.ch on port 2811
GridftpClient: gridFTPClient tcp buffer size is set to 1048576
GridftpClient: gridFTPWrite started, source file is java.io.RandomAccessFile@1c5fde0 destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: gridFTPWrite started, destination path is /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
GridftpClient: set local data channel authentication mode to None
GridftpClient: parallelism: 10
GridftpClient: adler32 for file java.io.RandomAccessFile@1c5fde0 is 5f2c7205
GridftpClient: waiting for completion of transfer
GridftpClient: starting a transfer to /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1: Cannot create file: File exists (error code 1)
        at org.globus.ftp.FTPClient.get127Reply(FTPClient.java:1617)
        at org.globus.ftp.FTPClient.put2(FTPClient.java:1857)
        at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:956)
        at java.lang.Thread.run(Thread.java:595)
GridftpClient:  transfer exception
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1: Cannot create file: File exists (error code 1)
        at org.globus.ftp.FTPClient.get127Reply(FTPClient.java:1617)
        at org.globus.ftp.FTPClient.put2(FTPClient.java:1857)
        at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:956)
        at java.lang.Thread.run(Thread.java:595)
GridftpClient: closing client : org.dcache.srm.util.GridftpClient$FnalGridFTPClient@19e8329
GridftpClient: closed client
copy failed with the error
org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message: /pnfs/lcg.cscs.ch/cms/local_tests/automatic_test-20080904-2021-8387-srm1: Cannot create file: File exists (error code 1)
        at org.globus.ftp.FTPClient.get127Reply(FTPClient.java:1617)
        at org.globus.ftp.FTPClient.put2(FTPClient.java:1857)
        at org.dcache.srm.util.GridftpClient$TransferThread.run(GridftpClient.java:956)
        at java.lang.Thread.run(Thread.java:595)
 try again
sleeping for 40000 before retrying
setting file request -2144592416 status to Done

  • The system is not under heavy load (actually there is almost no transfer)
  • The failing srmcp seem to always leave a pnfs entry behind
  • A restart of the core services on storage01 (admin) and storage02 (pnfs cell) did not help.
  • Is failure mode related to specific gridftp doors?
    • successful tests via se05.lcg.cscs.ch gridftp door (4 times), se01 (1), .
    • failing tests via se01 (1), se05 (2), se06 (4 times), se09 (1)
    • Based on the above, it seems not.
  • SRM log (/opt/d-cache/libexec/apache-tomcat-5.5.2/catalina.out)
    • file is filled with strange permission denied entries and very large Show log sample Hide
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:file does not exist, cannot delete
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      09/04 21:41:24 Cell(SRM-storage01@srm-storage01Domain) : RemoveFileFailed:permission denied
      
    • RM: These "permission denied" entries start on July 14 at 13:08. On this day we did a kernel upgrade and we restarted dcache for the first time since weeks.
Show occurrences count in logfiles Hide
Logfile Date last written Number of "permission denied" entries
srm-storage01Domain.log today 8250618
srm-storage01Domain.log.1.gz Aug 31 12951997
srm-storage01Domain.log.2.gz Aug 24 11031498
srm-storage01Domain.log.3.gz Aug 17 11886351
srm-storage01Domain.log.4.gz Aug 10 9500013
srm-storage01Domain.log.5.gz Aug 3 7509511
srm-storage01Domain.log.6.gz Jul 27 3595576
srm-storage01Domain.log.7.gz Jul 20 2670322
srm-storage01Domain.log.8.gz Jul 13 0
srm-storage01Domain.log.9.gz Jul 6 0
srm-storage01Domain.log.10.gz Jun 29 0
srm-storage01Domain.log.11.gz Jun 28 0

dCache server and client versions failure dependencies

  • the CSCS servers are still running dCache-1.8.0-12p6
  • CSCS UI, which exhibited the above problems, uses dcache-srmclient-1.8.0-15p5 (installed with gLite 3.1 update 30 on 2008-09-03)
  • Tests from the PSI UI with dcache-srmclient-1.8.0-4.noarch showed no problems (tried 10 times in a row the same command as on CSCS UI). This package was first released in gLite 3.1 update 22 on 2008-05-13.

  • A series of tests were run during the weekend (the script used for running the tests is atteched to this page), to assess the behavior of dCache clients of different versions; dCache clients version 1.8.0-4 and 1.8.0-15p5 were run on ui.lcg.cscs.ch with the results tabled below. Ostensibly, Failure 1 (see above) is due to an incompatibility between the newer dCache client (1.8.0-15p5) and the older dCache server (storage01 is currently running dCache 1.8.0-12p6). The tests run for 48 hours, from Friday Sept. 5 at 22:45 to Sunday, Sept. 7; the same file was continually copied (with SRMv2) to the server, with a 5-seconds pause after each attempt, and deleted (via GridFTP) after each successful upload.
dCache client version Number of failures Total number of attemps Failure rate
1.8.0-4 8 8936 0.08%
1.8.0-15p5 5190 9096 57.05%

Solution or Workaround

dcache support reaction

Peter Kunszt contacted the dcache support mailing list (ticket #3472) on Mon, Sep. 8th. Since we had not heard back from support, a more detailed description of the problem was filed by Riccardo Murri on Fri, Sep 12th under support ticket #3490.

Date: Fri, 12 Sep 2008 10:11:30 +0200 (CEST)
From: Gerd Behrmann via RT 
To: rmurri@REMOVETHISSTRINGcscs.ch
Reply-To: support@REMOVETHISSTRINGdcache.org
Subject: Re: [www.dcache.org #3490] failed SRM uploads from 1.8.0-15p5 client to 1.8.0-12p6 server

Hi,

I thought this was fixed in the latest clients.

The issue is that older versions of dCache server had a bug. This was 
masqueraded by the way the old clients closed the connection. However 
that client behavior resulted in problems with Castor and was therefore 
changed. The bug in the dCache server has also been fixed since.

I was under the impression we added a workaround to the client for the 
bug in the older dCache server, but maybe it was never released? Timur, 
can you please comment on this. All we need to do is to catch the 
IOException on close and ignore it.

Cheers,

gerd

Monitoring for this condition

Not applicable.

-- DerekFeichtinger - 04 Sep 2008

IssueForm
Affected Service dCache SRM writing
Symptom summary SRMv1 and v2 writes fail irregularly, often with "files exists" errors
Reason Understood yes
Solution Exists no
Obsolete yes
Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng cmsSAM-30624_65.png r1 manage 44.0 K 2008-09-05 - 07:12 DerekFeichtinger CMS SAM for mc stageout
Unix shell scriptsh test-srmcp.sh r1 manage 2.0 K 2008-09-08 - 08:24 RiccardoMurri Script used to test different dCache SRM client versions
Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r18 - 2011-01-14 - PabloFernandez
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback