slow TCP connections due to very different speed of segments?

Question

Hello group/list,I've checked the FAQ but I couldn't find any reference to this issue.Our campus LAN is mostly Gigabit Ethernet fiber and 100 Mb/s UTPdistribution, but we still have some distribution done to remote partsof the campus done over LRE (long range Ethernet), which is much like a"local DSL". It's supposed to give 10Mb/s under the best conditionsAFAIK.People from these remote locations complain that traffic to servers onthe core network is very slow. I've ruled out problems at client orserver side. I've tested file transfers across the LRE segment andacross the Gb Eth. segment. Their speeds were close to the expectedmax, so this (I guess) rules out problems on the segments themselves(esp. the LRE). I'm starting to wonder whether the big drop in speedbetween the two segments isn't the root cause (I mean, a 1Gb/s and a10Mb/s segment). Would Ethernet gurus be so kind to comment? (I'm apoor system admin, acting as a network admin!).The exact topology is as follows:(Core LAN)   |[Cisco Catalyst 4006 L3 switch]   |1Gb/s Eth fiber   |[SMC L2 switch]   |100 Mb/s UTP   |[Cisco LRE 29xx switch]   | phone cable   |[LRE end equipment]   | 100 Mb/s UTP   |(client PC)Thanks in advance for any comment, tip etc.NOTE: the From: e-mail address is a dead one. Please post.Greets,_Alain_

anybody43 · Accepted Answer

There is no issue with the speed mis-match that you describe.TCP was designed to operate in that environment and asis witnessed by the internet and other WANs does actuallydoes so.Performance problems are always tricky, if you don't knowquite a lot about how this stuff works and can use toolslike packet sniffers and interpret the results it could bequite difficult.1.The absolute first thing is to check the counters on allof the equipment to see of there are errors accumulating.Fix them. Any on Ethernet will most likely be caused byduplex mis-match.The error rate is basically zero on this kit now andless than 1 in a million is OK. TCP is very good at recoveryand can hide much higher rates. On LANs most portshave zero errors *ever*.2.Make sure that the performance is not actually as designed.This is much harder. For example some users may complainthat windows file copies are very slow when they drag a directoryacross the link. It casn take many net transactions to get evenone file across...

Robert Redelmeier · Answer

Gee ... people complain?  :)Both directions?  When users were complaining?  I typicallyrun ttcp and pingAFAIK, ethernet itself isn't the issue, but that Crisco bridgemay be.  TCP/IP has a discovery mechanism that can work well,but hasn't always been well implemented (especially MS-Win9*).I also believe some of the newer p2p apps use UDP.  You may needto sniff the link.It gets into Quality-of-Service issues, but if someone is hoggingthat 10 Mb/s line, everyone else will suffer.  Paradoxicallyworse (latency) if the Crisco or LRE has big buffers.-- Robert

alainjean · Answer

Thanks to you Robert, FAB, Rick for all the input.

I'll do another pass of investigation based on your comments, but here are a few more words on this, though:

- duplex mismatch: I've checked this all along the path, no duplex mismatch (esp. the UTP link between the SMC and the Cisco LRE switch)

- this problem seem to happen all the time. When I've done my tests, I've used ftp transfers (from Unix boxes, so that I know at least that the ftp server is decent). Yes, the slowness was in both directions, although, as expected, download was always signficantly faster than upload. Ping times were normal. I'll try ttcp too, thanks.

- I've concentrated on one complaint from one user who's making ftp transfers only... we don't allow P2P on the University campus anyway :-)

- the LRE links are not shared, so no one can be hogging them

My guts feeling is that somehow data gets pushed too fast over the fast link and that either of the SMC or Cisco LRE switches drop frames. I understand that normal TCP window mechanisms should take care of that, but I suspect that they don't, somehow, and I'm not sure how to check this (where can I find something about this in netstat output BTW?).

Any new hint is welcome, I'll followup after a new round of checking counters and such.

Greets, _Alain_

Rick Jones · Answer

Of course, if the buffers are too small - say smaller than they typical TCP windows being used, a burst of traffic from the fast side may fill the buffers and overflow. Checks of the statistics might be in order - both netstat on the end systems and link-level on each side of that 10 Mbit/s pipe (or anywhere there is a speed change I suppose).

rick jones

Robert Redelmeier · Answer

Also check cable termination quality. A split pair (homemade) on a cable will ruin performance on that collision domain.

Is LRE assymmetric? I wouldn't expect different performances, unless there are slow disks limiting.

If it's low ftp throughput, you might try adjusting that users TCPRecvWindow.

-- Robert

Rick Jones · Answer

On the sending side, look for lines relating to retransmissions. Compar with total data segments sent. Also compare retransmissions to data segments retransmitted. On the recv side, look for out-of-order segments. While it may not match your platform entirely, the following:

ftp://ftp.cup.hp.com/dist/networking/briefs/annotated_netstat.txt

may be of some help. You may also want to compile:

ftp://ftp.cup.hp.com/dist/networking/tools/beforeafter.tar.gz

on your system so you can "subtract" one netstat from the other (check it carefully though, as that code was written and tested only on HP-UX netstat and lanadmin - it is simple, but perhaps not simple enough)

TCP has mechanisms to attempt to adapt to congestion. How well it works depends on the nature of the congestion and the flavor of TCP being used - seems one sure way to get one's degree these days is describe yet another tweak to congestion control :)

Someone else asked about asymmetry - that could indeed be an issue particularly if the receiver is an "ack-every-other" and the asymmetry is great - basically, if the ratio is worse than 2*MSS/60 where MSS is the MSS for TCP over the asymmetric link and 60 is a wag for the size of an ACK segment, then the ACK's may be saturationg the slow return link and limiting the floy of TCP segments the other way.

rick jones

slow TCP connections due to very different speed of segments?

Join the Discussion

Didn't find your answer?