Server (100Mb NIC) --> router (100Mb) --> T1 ATM (IMA) PVC --> 128kb client circuit --> client PC (100Mb).
Certain times during the day, the server needs to send a lot of data to client PC. Since the server NIC is 100Mb and it traverses a T1, followed by a smal 128kb link at client site, whose job is it to make sure the traffic adjusts for speed differences? My captures at server side show an enormous amounts of TCP retransmits during the traffic bursts. I know frame relay has a mechanism to throttle back , but not sure how ATM PVCs handle this.
It will try to buffer (fifo) at first, but that will not really help, so when the buffer is filled, you will get tail drops, which will result in your tcp session slowing down, doing it's slow start thingie, get up to a certain speed again, get drops again, slow down, etc.
So actually it's doing what it should be doing. If you want to do it without the drops and retransmits, you probably need something that will play with tcp window sizes (like packeteers and the likes do). Or make your server limit itself.
ATM is quite close. Is it a single session or multiple ones? What you need is per-VC queuing correct ATM shaping parameters for this VC. For single session not much else besides making sure there is enough buffer to handle 128Kbs worth of traffic multiplied by round-trip time. For multiple sessions CBWFQ would be bees knees.
This is true to some extent however modern TCPs do a lot better. There is "fast retransmit" and "selective ack". Make sure Selective Ack is enabled on both ends if possible. Current Window's TCPs have it on by default. Easiest thing if you have captures is to look at the two SYNs.
You don't say what "an enourmous number" might be.
1% would not seem too bad to me.
As already mentioned TCP was designed to deal with this type of network and, especially with all the modern bells and whistles, does pretty well.
Huge buffers are not the answer either. Andrey's advice sounds OK. The two TCPs communicate congestion information between each other, if you whack in a huge buffer, the main effect is to delay that comminucation.
The easiest place to see what is going on at a high level is to check the interface counters for output drops.
Worth noting that UDP does not have such a congestion control mechanism and that if you had a high proportion of UDP traffic then you might see some issues.
I know frame relay has a mechanism to throttle back , but not
Not sure - but it may depend on what type of PVC setup you have.
the only thing that understands the bandwidth change directly is the router - but unless you configure the PVC rate specific to that PVC it will just shape to line rate (ie T1 at your end).
Are you shaping the traffic to the ATM 128 K PVC or are you just letting it blast away at the ATM line rate?
If you are lucky the ATM switches downstream are clever enough to discard the entire set of cells associated with a single AAL5 format packet on congestion or PVC traffic overload - but this isnt always supported (or working properly).
try altering the outbound Q at the router ATM interface - a smaller queue may help, but the key is usually WFQ or WRED to provide feedback to the TCP congestion control.
If you have lots of framing errors at the ATM recieving end, the ATM network is dropping some cells within packets - the fix for that is probably to shape the per PVC traffic to what you are allowed to send in the contract.
When I look at capture I/O graph (using Wireshark/Ethereal), I see a burst of around 60 seconds where the T is completely maxed out. (We have about 40 clients connected at 128Kb). After the one minute peak, I see a definite back-off from the server and the line goes to about
20% utilization but with about 3% of retransmit packets. Is this normal?
This sounds like each PVC is only limited to the line rate
After the one minute peak,
the issue isnt so much the error rate as the "good put" - ie how much of the available 128k is actually moving useful info. Try moving a big file and measure the time, then work backwards to the application bit rate - if you get to 80% of the 128 Kbps then you are doing reasonably well.
what is probably happening is that the 2 devices at the ends of the connection "see" the errors and reduce the data rate to roughly match the available thruput.
however - 60 sec is a long time to notice what is going on and adapt, and the transfer isnt doing "slow start", so is probably a block scheme of some sort.
Is the connection using a protocol that just blasts traffic in big bursts such as windows SMB (file sharing)?
If so there is a bit of tinkering you can do and the machines to tweak the transfer but it isnt straightforward and it may have side effects on other windows transfers. If you want to try you need selective ACK (at both ends). You can reduce the max transfer size, but this will affect local transfers - but since it is negotiated at connection setup you only need to reduce 1 device.
Try an FTP if you can and see if a TCP based protocol behaves better, and to give you a performance baseline. Or use a test tool such as TTCP.