Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> ---+!! %TOPIC% %TOC% ---++ Symptoms Summary: %FORMFIELD{"Symptom summary"}% ---++ Case 1 ---+++ Observations <!-- #collect here the information which may help to better understand the state of the system or services, e.g. #log excerpts, strace output, etc. #this also may help to identify the problem if similar conditions arise again --> srmcp fails because the clients receives a wrong TURL <verbatim> srmcp -debug srm://t3se01.psi.ch:8443/srm/managerv1?SFN=//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1 file:////tmp/dcachetest-20080828-2242-8051/test-srmcp TEST: SRMv1-read WARNING: SRM_PATH is defined, which might cause a wrong version of srm client to be executed WARNING: SRM_PATH=/opt/d-cache/srm Storage Resource Manager (SRM) CP Client version 2.0 Copyright (c) 2002-2006 Fermi National Accelerator Laboratory SRM Configuration: debug=true gsissl=true help=false pushmode=false userproxy=true buffer_size=131072 tcp_buffer_size=0 streams_num=10 config_file=config.xml glue_mapfile=conf/SRMServerV1.map webservice_path=srm/managerv1 webservice_protocol=https gsiftpclinet=globus-url-copy protocols_list=http,gsiftp save_config_file=null srmcphome=.. urlcopy=sbin/urlcopy.sh x509_user_cert=/home/timur/k5-ca-proxy.pem x509_user_key=/home/timur/k5-ca-proxy.pem x509_user_proxy=/tmp/x509up_u3896 x509_user_trusted_certificates=/etc/grid-security/certificates globus_tcp_port_range=null gss_expected_name=null storagetype=permanent retry_num=20 retry_timeout=10000 wsdl_url=null use_urlcopy_script=false connect_to_wsdl=false delegate=true full_delegation=true server_mode=passive srm_protocol_version=1 request_lifetime=86400 access latency=null overwrite mode=null priority=0 from[0]=srm://t3se01.psi.ch:8443/srm/managerv1?SFN=//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1 to=file:////tmp/dcachetest-20080828-2242-8051/test-srmcp Thu Aug 28 22:43:13 CEST 2008: starting SRMGetClient Thu Aug 28 22:43:13 CEST 2008: In SRMClient ExpectedName: host Thu Aug 28 22:43:13 CEST 2008: SRMClient(https,srm/managerv1,true) SRMClientV1 : user credentials are: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger SRMClientV1 : SRMClientV1 calling org.globus.axis.util.Util.registerTransport() SRMClientV1 : connecting to srm at httpg://t3se01.psi.ch:8443/srm/managerv1 Thu Aug 28 22:43:14 CEST 2008: connected to server, obtaining proxy Thu Aug 28 22:43:14 CEST 2008: got proxy of type class org.dcache.srm.client.SRMClientV1 SRMClientV1 : get: surls[0]="srm://t3se01.psi.ch:8443/srm/managerv1?SFN=//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1" SRMClientV1 : get: protocols[0]="gsiftp" SRMClientV1 : get: protocols[1]="dcap" SRMClientV1 : get: protocols[2]="http" copy_jobs is empty Thu Aug 28 22:43:15 CEST 2008: srm returned requestId = -2147470564 Thu Aug 28 22:43:15 CEST 2008: sleeping 4 seconds ... Thu Aug 28 22:43:19 CEST 2008: FileRequestStatus with SURL=srm://t3se01.psi.ch:8443/srm/managerv1?SFN=//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1 is Ready Thu Aug 28 22:43:19 CEST 2008: received TURL=gsiftp://0.0.0.0:2811//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1 Thu Aug 28 22:43:19 CEST 2008: fileIDs is empty, breaking the loop copy_jobs is not empty copying CopyJob, source = gsiftp://0.0.0.0:2811//pnfs/psi.ch/cms/automatic_test-20080828-2242-8051-srm1 destination = file:////tmp/dcachetest-20080828-2242-8051/test-srmcp GridftpClient: memory buffer size is set to 131072 GridftpClient: connecting to 0.0.0.0 on port 2811 copy failed with the error java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:520) at java.net.Socket.connect(Socket.java:470) at java.net.Socket.<init>(Socket.java:367) at java.net.Socket.<init>(Socket.java:267) at org.globus.net.SocketFactory.createSocket(SocketFactory.java:74) at org.globus.net.SocketFactory.createSocket(SocketFactory.java:53) at org.globus.ftp.vanilla.FTPControlChannel.open(FTPControlChannel.java:135) at org.globus.ftp.GridFTPClient.<init>(GridFTPClient.java:74) at org.dcache.srm.util.GridftpClient$FnalGridFTPClient.<init>(GridftpClient.java:1080) at org.dcache.srm.util.GridftpClient.<init>(GridftpClient.java:212) at gov.fnal.srm.util.Copier.javaGridFtpCopy(Copier.java:595) at gov.fnal.srm.util.Copier.copy(Copier.java:495) at gov.fnal.srm.util.Copier.run(Copier.java:321) at java.lang.Thread.run(Thread.java:595) try again sleeping for 10000 before retrying </verbatim> ---+++ Reason The OpenSolaris pool node still had a network configurated automatically by the *nwam* service (svc:/network/physical:nwam). e1000g0 had been configured correctly to the public interface by dhcp, but the other interfaces had been assigned 0.0.0.0 addresses. The dcache pool seemingly signed up with one of these other addresses to the head node, and the SRM server returned a TURL for a nonexistent gridftp door. There should be a way of explicitely specifying the interface with which a pool wants to be signed up, or the bloody thing should make a more intelligent guess as to the default (use the interface you use to contact the head node.....). <verbatim> -bash-3.2# ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 e1000g0: flags=201004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4,CoS> mtu 1500 index 2 inet 192.33.123.42 netmask ffffff00 broadcast 192.33.123.255 ether 0:14:4f:a6:d1:f0 e1000g1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 3 inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255 ether 0:14:4f:a6:d1:f1 e1000g2: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4 inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255 ether 0:14:4f:a6:d1:f2 e1000g3: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 5 inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255 ether 0:14:4f:a6:d1:f3 lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1 inet6 ::1/128 </verbatim> ---+++ Solution Remove the unwanted interfaces, or as in our case, make one single aggregated interface from them. ---++ Case2 ---+++ Observation Client receives a TURL which does not contain the a fully qualified host name, but a local name. The TURL will look like <pre> lcg-cp --connect-timeout 10 --sendreceive-timeout 120 --srm-timeout 180 -b --vo ops -D srmv2 -U srmv2 -v file:/home/samops/.same/SRMv2/testFile.txt 'srm://t3se01.psi.ch:8443/srm/managerv2?SFN=/pnfs/psi.ch/ops/testfile-cp-20091119-134821-b532e7e7c782.txt' Using grid catalog type: UNKNOWN Using grid catalog : (null) VO name: ops Checksum type: None Destination SE type: SRMv2 Destination SRM Request Token: -2146844058 Source URL: file:/home/samops/.same/SRMv2/testFile.txt File size: 41472 Source URL for copy: file:/home/samops/.same/SRMv2/testFile.txt %RED%Destination URL: gsiftp://t3fs05:2811//pnfs/psi.ch/ops/testfile-cp-20091119-134821-b532e7e7c782.txt%ENDCOLOR% # streams: 1 file:/home/samops/.same/SRMv2/testFile.txt: globus_xio: Unable to connect to t3fs05:2811 globus_xio: globus_libc_getaddrinfo failed. globus_common: Name or service not known lcg_cp: Communication error on send + retcode=1 + set +x Thu Nov 19 12:48:28 UTC 2009 [1258634908] </pre> ---+++ Reason The hostname on the Solaris file server was taken from the dhcp configured =/etc/hosts= file, which only contained the short local name. ---+++ Solution Either delete the line in /etc/hosts, put the FQDN there, or reconfigure DHCP to correctly deliver the full name. -- Main.DerekFeichtinger - 29 Aug 2008
IssueForm
Affected Service
dCache SRM read
Symptom summary
SRM server returns wrong TURL to client
Reason Understood
yes
Solution Exists
yes
Obsolete
no
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r3 - 2009-11-19
-
DerekFeichtinger
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback