dCache Problem descriptions

SRM problem

Description: SRMv1 read from the SE fails, even though globus-url-copy of the same file from the SE as well as from the pool node directly works fine.

Other Symptoms: The best information can be obtained from the pool cell of the affected pool:

srmcp -debug srm://t3se01.psi.ch:8443/srm/managerv1?SFN=/pnfs/psi.ch/cms/automatic_test-20080823-0015-2165-srm1 file:////tmp/derek-srm1-1 

10.43.10  Sending reply Pool=t3fs01_cms_1;NOTAGS;;PnfsId=000200000000000000001778;have=true;waiting=false
10.43.10  PoolIoFileMessage delivered to ioFile (method)
10.43.10  Dup Request : none 
10.43.10  JOB queued 000200000000000000001778
10.43.10  JOB prepare 000200000000000000001778
10.43.10  Entry for 000200000000000000001778 found and is 000200000000000000001778  51200 si={cms:cms}
10.43.10  JOB run 000200000000000000001778
10.43.10  Trying to open /data1/t3fs01_cms_1/data/000200000000000000001778
10.43.10  (GFtp_2_nio/000200000000000000001778) version=1, role=Sender, mode=E, host=127.0.0.1:34210, buffer=0, passive=false, parallelism=10
10.43.10  (GFtp_2_nio/000200000000000000001778) Connecting to /127.0.0.1:34210
10.43.10  (GFtp_2_nio/000200000000000000001778) java.net.ConnectException: Connection refused

A copy directly from the pool that works, looks like this

globus-url-copy -dbg gsiftp://t3fs01.psi.ch/pnfs/psi.ch/cms/automatic_test-20080823-0015-2165-srm1 file:/tmp/mytest.tst

10.51.12  Sending reply Pool=t3fs01_cms_1;NOTAGS;;PnfsId=000200000000000000001778;have=true;waiting=false
10.51.12  PoolIoFileMessage delivered to ioFile (method)
10.51.12  Dup Request : none 
10.51.12  JOB queued 000200000000000000001778
10.51.12  JOB prepare 000200000000000000001778
10.51.12  Entry for 000200000000000000001778 found and is 000200000000000000001778  51200 si={cms:cms}
10.51.12  JOB run 000200000000000000001778
10.51.12  Trying to open /data1/t3fs01_cms_1/data/000200000000000000001778
10.51.12  (GFtp_2_nio/000200000000000000001778) version=1, role=Sender, mode=S, host=t3fs01:32916, buffer=0, passive=false, parallelism=5
10.51.12  (GFtp_2_nio/000200000000000000001778) Connecting to t3fs01/192.33.123.41:32916
10.51.12  (GFtp_2_nio/000200000000000000001778) Opened Socket[addr=t3fs01/192.33.123.41,port=32916,localport=32917]
10.51.12  (GFtp_2_nio/000200000000000000001778) Entering event loop
10.51.12  (GFtp_2_nio/000200000000000000001778) Closing Socket[addr=t3fs01/192.33.123.41,port=32916,localport=32917]
10.51.12  (GFtp_2_nio/000200000000000000001778) Multiplexer shutting down
10.51.12  (GFtp_2_nio/000200000000000000001778) Left event loop and closing channels
10.51.12  (GFtp_2_nio/000200000000000000001778) Transfer finished: 51200 bytes transferred in 0.09 seconds = 0.568 MB/s
10.51.12  000200000000000000001778;length=51200;timer=Null Time elapsed
10.51.13  IO thread finished : Thread[Scheduler-IO-1-worker,5,t3fs01_cms_1-threads]

In the SRM case, somehow the pool tries to connect to "/127.0.0.1:[port]" instead of to "t3fs01/192.33.123.41:[port]". The hostname is an empty string, it seems. Unclear what is used in the real connect. May also be that the listening socket is not for ALL addresses of the host (including localhost) but only for the named interface.

To identify the issue, we attach a truss to the pool process on the pool node:

truss -v connect -f -p 2954 &> /tmp/truss

Looking through the output for both cases, we find that in the case of the SRM mediated copy, the connect indeed is tried to the localhost address, while in the direct gridftp case, the named interface was used:

# SRM mediated copy
2954/82:        connect(25, 0xD10B0850, 16, SOV_DEFAULT)        Err#146 ECONNREFUSED
2954/82:                AF_INET  name = 127.0.0.1  port = 34235

# standalone gridftp copy
2954/82:        connect(24, 0xD10B0850, 16, SOV_DEFAULT)        = 0
2954/82:                AF_INET  name = 192.33.123.41  port = 32930

A reference installation on the Tier-2 shows that a Solaris-1 node also only listens to the named interface based on the netstat output:

se05# netstat

UDP: IPv4
   Local Address        Remote Address      State
-------------------- -------------------- ----------
se05.32772           mon.8671             Connected

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
se05.41053           se09.60559           49640      0 49640      0 TIME_WAIT
se05.24411           wn08.52485            5840      0 49232      0 TIME_WAIT
se05.24411           wn08.52486            5840      0 49232      0 TIME_WAIT
se05.24411           wn08.52487            5840      0 49232      0 TIME_WAIT
se05.24411           wn08.52488            5840      0 49232      0 TIME_WAIT
se05.24411           wn08.52489            5840      0 49232      0 TIME_WAIT
se05.24411           wn08.52490            5840      0 49232      0 TIME_WAIT
se05.24411           wn08.52491            5840      0 49232      0 TIME_WAIT
se05.24411           wn08.52492            5840      0 49232      0 TIME_WAIT
se05.24411           wn08.52493            5840      0 49232      0 TIME_WAIT
se05.24411           wn08.52494            5840      0 49232      0 TIME_WAIT
se05.ssh             ui.59621              9056      0 49232      0 ESTABLISHED
se05.65360           storage01.52323      131072      0 49640      0 ESTABLISHED
se05.65361           storage01.52323      131072      0 49640      0 ESTABLISHED
se05.65362           storage01.52323      130944      0 49640      0 ESTABLISHED
se05.65363           storage01.52323      131072      0 49640      0 ESTABLISHED

-- DerekFeichtinger - 28 Aug 2008

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2008-08-28 - DerekFeichtinger
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback