Schema and discussion for network / dCache setup in Phase C
network/dCache services schema
This will be updated as the discussion progresses.
basic situation
Since computing nodes have more and more cores (our current system has
4CPUs=16 cores per node), network and local disks are beginning to become
limiting. In blades you often only have space for 2 disks, and mostly you
only attach one eth connection.
For this reason, we want to use the (now affordable)
infiniband technology for the main internal data flows. Worker nodes will use a shared FS (Lustre) for scratch, and we also want to attach the storage element through TCP over Infiniband.
On our current system, we see heavy WN to SE traffic via dcap, and also (mainly for ATLAS) heavy stage-in activity via dcap (dccp) and gsiftp. ATLAS in the meantime said the would refrain from using dccp, because this created havoc (usually we allow a large number of dcap movers per pool).
Our current dcache system is running with 28 Sun X4500 ("Thumper") Solaris
systems ( 24 TB each) as file servers, with two linux nodes for the dcache
services.
EveryThumper also runs dcap and gsiftp doors. These have performed
very well. Never saw load based problems.
Now we will replace these servers with 28 successor models Sun X4540 ("Thors") with 48 TB each.
All machines - fileservers and worker nodes (WNs) will have ethernet interfaces as well. We want to have the dcap communication between the WNs (960 cores) and the Storage to go via Infiniband. But the system must also be available from the internet for remote SRM/gsiftp transfers. SRM and gsiftp also need to be available from the worker nodes for stageout/stagein by jobs.
Preliminary discussion by mail
2009-07-21 Patrick Fuhrmann to DF
Hi Derek,
Making a long story short : Having two interfaces and having clients coming
through both interfaces is
very troublesome and is not possible right now in all cases.
!!! So it's important to know which protocols you intend to use from inside
resp. outside ?
The only protocol which recently has been enabled for the two-interface use
case, is the grid-ftp
protocol. The gridftp mover in the pool will (in server passive mode) send the
IP number of the correct
interface to the client.
Ddcap/xroot however, will return the IP number of the primary interface only.
On the translation of the SURL to the TURL, done by the SRM :
The SURL is composed of the reverse lookup of the IP address of the door. If
there are multiple
interfaces, dCache will use the external one preferred to a '192....' like
address. So the TURL is
difficult to determine.
This is the current situation.
For xroot and dcap it seems (talking to the experts) we can get this improved
before the golden release
(1.9.5) is due end of September. For the SRM stuff we need to investigate.
This is very unlikely to be
available in 1.9.5.
Hope this helps a little bit in deciding. If you like we may have a short
phone conference this week on
the matter allowing you to describe the use in more depth.
cheers
patrick
2009-07-21 DF to Patrick Fuhrmann
Based on what I see from your mail, I think that one could have a solution
where we have dcap doors on all file servers, which only respond on the
infiniband addresses to the infiniband network.
We also seem to be able to keep the gsiftp doors on the file servers, since
they will be able to cater via the Infiniband to the worker nodes, as well as
via the ethernet to the outside. This would be fantastic, and just what we
need.
The SRM seems to be a bit problematic... So, we may have the situation where a
worker node client using SRM will receive the ethernet address of the file
server. This still would work, but the transfer would then happen through the
ethernet connection between the WN and the pool servers.
--
DerekFeichtinger - 22 Jul 2009