Tags:
create new tag
view all tags

IssueDcacheSrmRead2

Symptoms

Summary: SRM server returns wrong TURL to client

Case 1

Observations

srmcp fails because the clients receives a wrong TURL

srmcp -debug srm://t3se01.psi.ch:8443/srm/managerv1?SFN=//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1 file:////tmp/dcachetest-20080828-2242-8051/test-srmcp

TEST: SRMv1-read
WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed
WARNING: SRM_PATH=/opt/d-cache/srm
Storage Resource Manager (SRM) CP Client version 2.0
Copyright (c) 2002-2006 Fermi National Accelerator Laboratory

SRM Configuration:
        debug=true
        gsissl=true
        help=false
        pushmode=false
        userproxy=true
        buffer_size=131072
        tcp_buffer_size=0
        streams_num=10
        config_file=config.xml
        glue_mapfile=conf/SRMServerV1.map
        webservice_path=srm/managerv1
        webservice_protocol=https
        gsiftpclinet=globus-url-copy
        protocols_list=http,gsiftp
        save_config_file=null
        srmcphome=..
        urlcopy=sbin/urlcopy.sh
        x509_user_cert=/home/timur/k5-ca-proxy.pem
        x509_user_key=/home/timur/k5-ca-proxy.pem
        x509_user_proxy=/tmp/x509up_u3896
        x509_user_trusted_certificates=/etc/grid-security/certificates
        globus_tcp_port_range=null
        gss_expected_name=null
        storagetype=permanent
        retry_num=20
        retry_timeout=10000
        wsdl_url=null
        use_urlcopy_script=false
        connect_to_wsdl=false
        delegate=true
        full_delegation=true
        server_mode=passive
        srm_protocol_version=1
        request_lifetime=86400
        access latency=null
        overwrite mode=null
        priority=0
        from[0]=srm://t3se01.psi.ch:8443/srm/managerv1?SFN=//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1
        to=file:////tmp/dcachetest-20080828-2242-8051/test-srmcp

Thu Aug 28 22:43:13 CEST 2008: starting SRMGetClient
Thu Aug 28 22:43:13 CEST 2008: In SRMClient ExpectedName: host
Thu Aug 28 22:43:13 CEST 2008: SRMClient(https,srm/managerv1,true)
SRMClientV1 : user credentials are: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger
SRMClientV1 : SRMClientV1 calling org.globus.axis.util.Util.registerTransport()
SRMClientV1 : connecting to srm at httpg://t3se01.psi.ch:8443/srm/managerv1
Thu Aug 28 22:43:14 CEST 2008: connected to server, obtaining proxy
Thu Aug 28 22:43:14 CEST 2008: got proxy of type class org.dcache.srm.client.SRMClientV1
SRMClientV1 :   get: surls[0]="srm://t3se01.psi.ch:8443/srm/managerv1?SFN=//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1"
SRMClientV1 :   get: protocols[0]="gsiftp"
SRMClientV1 :   get: protocols[1]="dcap"
SRMClientV1 :   get: protocols[2]="http"
copy_jobs is empty
Thu Aug 28 22:43:15 CEST 2008:  srm returned requestId = -2147470564
Thu Aug 28 22:43:15 CEST 2008: sleeping 4 seconds ...
Thu Aug 28 22:43:19 CEST 2008: FileRequestStatus with SURL=srm://t3se01.psi.ch:8443/srm/managerv1?SFN=//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1 is Ready
Thu Aug 28 22:43:19 CEST 2008:        received TURL=gsiftp://0.0.0.0:2811//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1
Thu Aug 28 22:43:19 CEST 2008: fileIDs is empty, breaking the loop
copy_jobs is not empty
copying CopyJob, source = gsiftp://0.0.0.0:2811//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1 destination = file:////tmp/dcachetest-20080828-2242-8051/test-srmcp
GridftpClient: memory buffer size is set to 131072
GridftpClient: connecting to 0.0.0.0 on port 2811
copy failed with the error
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:520)
        at java.net.Socket.connect(Socket.java:470)
        at java.net.Socket.<init>(Socket.java:367)
        at java.net.Socket.<init>(Socket.java:267)
        at org.globus.net.SocketFactory.createSocket(SocketFactory.java:74)
        at org.globus.net.SocketFactory.createSocket(SocketFactory.java:53)
        at org.globus.ftp.vanilla.FTPControlChannel.open(FTPControlChannel.java:135)
        at org.globus.ftp.GridFTPClient.<init>(GridFTPClient.java:74)
        at org.dcache.srm.util.GridftpClient$FnalGridFTPClient.<init>(GridftpClient.java:1080)
        at org.dcache.srm.util.GridftpClient.<init>(GridftpClient.java:212)
        at gov.fnal.srm.util.Copier.javaGridFtpCopy(Copier.java:595)
        at gov.fnal.srm.util.Copier.copy(Copier.java:495)
        at gov.fnal.srm.util.Copier.run(Copier.java:321)
        at java.lang.Thread.run(Thread.java:595)
 try again
sleeping for 10000 before retrying

Reason

The OpenSolaris pool node still had a network configurated automatically by the nwam service (svc:/network/physical:nwam). e1000g0 had been configured correctly to the public interface by dhcp, but the other interfaces had been assigned 0.0.0.0 addresses. The dcache pool seemingly signed up with one of these other addresses to the head node, and the SRM server returned a TURL for a nonexistent gridftp door. There should be a way of explicitely specifying the interface with which a pool wants to be signed up, or the bloody thing should make a more intelligent guess as to the default (use the interface you use to contact the head node.....).

-bash-3.2# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
e1000g0: flags=201004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4,CoS> mtu 1500 index 2
        inet 192.33.123.42 netmask ffffff00 broadcast 192.33.123.255
        ether 0:14:4f:a6:d1:f0
e1000g1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        ether 0:14:4f:a6:d1:f1
e1000g2: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        ether 0:14:4f:a6:d1:f2
e1000g3: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 5
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        ether 0:14:4f:a6:d1:f3
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::1/128

Solution

Remove the unwanted interfaces, or as in our case, make one single aggregated interface from them.

Case2

Observation

Client receives a TURL which does not contain the a fully qualified host name, but a local name. The TURL will look like

lcg-cp --connect-timeout 10 --sendreceive-timeout 120 --srm-timeout 180 -b --vo ops -D srmv2 -U srmv2 -v file:/home/samops/.same/SRMv2/testFile.txt 'srm://t3se01.psi.ch:8443/srm/managerv2?SFN=/pnfs/psi.ch/ops/testfile-cp-20091119-134821-b532e7e7c782.txt'
Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name: ops
Checksum type: None
Destination SE type: SRMv2
Destination SRM Request Token: -2146844058
Source URL: file:/home/samops/.same/SRMv2/testFile.txt
File size: 41472
Source URL for copy: file:/home/samops/.same/SRMv2/testFile.txt
Destination URL: gsiftp://t3fs05:2811//pnfs/psi.ch/ops/testfile-cp-20091119-134821-b532e7e7c782.txt
# streams: 1
file:/home/samops/.same/SRMv2/testFile.txt: globus_xio: Unable to connect to t3fs05:2811
globus_xio: globus_libc_getaddrinfo failed.
globus_common: Name or service not known
lcg_cp: Communication error on send
+ retcode=1
+ set +x
Thu Nov 19 12:48:28 UTC 2009 [1258634908]

Reason

The hostname on the Solaris file server was taken from the dhcp configured /etc/hosts file, which only contained the short local name.

Solution

Either delete the line in /etc/hosts, put the FQDN there, or reconfigure DHCP to correctly deliver the full name.

-- DerekFeichtinger - 29 Aug 2008

IssueForm
Affected Service dCache SRM read
Symptom summary SRM server returns wrong TURL to client
Reason Understood yes
Solution Exists yes
Obsolete no
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2009-11-19 - DerekFeichtinger
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback