Throughput Tests with FNAL
These throughput tests were performed with the help of Matt Crawford from FNAL.
The first few tests I did per hand on Dec 1st 2006. On Dec 6 I tried to redo a full run with the help of my new script in an automated fashion,
but the iperf service at FNAL ceased to work. Since I've had no more contact since then I still want to publish these preliminary results.
It would be worth redoing these tests once the download problem is solved (see below).
Route, RTT and packet loss
traceroute to charley.fnal.gov (131.225.2.3), 30 hops max, 38 byte packets
1 148.187.33.2 (148.187.33.2) 0.429 ms 0.441 ms 0.375 ms
2 148.187.32.4 (148.187.32.4) 0.205 ms 0.228 ms 0.234 ms
3 swima2.cscs.ch (148.187.20.2) 0.491 ms 0.595 ms 0.497 ms
4 swiEL2-10GE-1-4.switch.ch (130.59.37.77) 3.453 ms 3.300 ms 3.202 ms
5 swiCE3-10GE-1-3.switch.ch (130.59.37.65) 4.094 ms 4.216 ms 3.851 ms
6 swiCE2-10GE-1-4.switch.ch (130.59.36.209) 4.092 ms 4.087 ms 4.110 ms
7 switch.rt1.gen.ch.geant2.net (62.40.124.21) 4.093 ms 4.153 ms 4.112 ms
8 so-7-2-0.rt1.fra.de.geant2.net (62.40.112.22) 12.218 ms 12.291 ms 12.231 ms
9 esnet-wash-gw.rt1.fra.de.geant2.net (62.40.125.78) 118.081 ms 117.939 ms 117.997 ms
10 aoacr1-oc192-dccr1.es.net (134.55.219.46) 112.567 ms 112.513 ms 112.570 ms
11 chicr1-oc192-aoacr1.es.net (134.55.209.57) 132.696 ms 132.527 ms 132.570 ms
12 chislsdn1-chicr1.es.net (134.55.207.34) 132.683 ms 132.645 ms 132.571 ms
13 198.49.208.230 (198.49.208.230) 133.721 ms 133.812 ms 133.727 ms
14 vlan360.r-s-hub-fcc.fnal.gov (131.225.15.78) 133.982 ms 133.796 ms 133.852 ms
15 s-s-fapl.fnal.gov (131.225.15.29) 133.718 ms 137.085 ms 133.839 ms
16 charley.fnal.gov (131.225.2.3) 133.842 ms 133.921 ms 133.583 ms
The round trip time from an extended ping run was 133ms with 0 packet loss.
Influence of TCP window size and number of parallel streams
The upload graph (TO) looks ok and large TCP window sizes show huge improvements as is expected based on the large RTT of 133ms.
The download graph shows up a major problem, because the rates are
much too low. The best TCP window size of the series is 256KB and for all window sizes there is a
linear increase of throughput with the number of streams. This points to a deeper problem (see below).
TCP based Problem for Downloads from FNAL (UNRESOLVED!)
This paragraph is based on a packet capture of an
iperf
run done with
wireshark
. The
iperf
command line used was:
./iperf -c charley.fnal.gov -p [hidden] -w 1024k -r -L [hidden] -m
The CSCS->FNAL test shows correct behavior. The initial window size of 17920
at the fnal end rises to 1501696- and it seems that a big part of the window is also used.
For the FNAL->CSCS test the TCP window grows from 8736 to 788384, which does not
look too bad. but It seems that the window size is never really used:
The typical behavior is:
I receive from FNAL 4 ACK packets of 1460 length in short intervals (~3ms),
where the last packet also has the PSH flag set. Then, after ~75 ms, the CSCS
side sends out an ACK.
It seems that the other side is not sending anything until this ACK has been received.
The next packet from fnal arrives after ~133ms (the normal round trip
time).
So, from the CSCS side it seems that the other side is never really using the available
window (only 4*1460 bytes). Somehow, after sending a few packets, the other
side is always waiting for an ACK from CSCS before continuing.
My FNAL->CSCS capture also shows many more PSH,ACK packets than the CSCS->FNAL one.
We need to get analysed captures done on the FNAL end to compare. It is not clear to me what is responsible
for this kind of behavior, but it could well be that it's some network component between us.
--
DerekFeichtinger - 08 Dec 2006