TCP Window Size

Hi

I would like to share my experience. We have a data base application with record of 30 million people.

PROBLEM : Slow application access from time to time

Server : data base application

WAN : 12Mbps of clear pipe end to end WAN links

client : A Thin and Terminal Server serving 200 thin/terminal server clients

Util% : WAN links are max 60% loaded

RTT : 20-120 msec with average RTT of 58msec

Server TCP Window Size : 24kByte

Client TCP Window Size : 65kByte

I have strong understanding that this is due to the SENDER and RECIEVER capacity mismatch

Kindly advise on this situation

What TCP window size should be used ?

Should it be changed on both ends ?

can FAST TCP be applied in this scenario ?

Waiting for your valuable answer

Thanks

Reply to
Cheema
Loading thread data ...

Just from my experience, I have a hard time blaming TCP windowing. How many concurrent users? Is this real time? How do the queries look? Are they efficient? What kind of bandwidth per user or per transaction, and how many users/transactions at any given time?

12Mbps is not that fast, but you need to provide context of whether or not 12Mbps is enough. Could be anything from server being busy with backups or some kind of schedule, to WAN pipe utilization going over 80% which would start to impact latency, to service provider, to anything. Have you used MRTG or Netflow to gauge bandwidth utilization at these times? How about latency? Do you have a baseline of these usages and performance during 'good performance' times? Do you have QoS? Could someone be running a FTP and killing your pipe?

I won't say that packet/frame sizes are NOT the issue, but I just hate to look at fundamental networking architecture when there are WAY too many other variables that are more likely. Not to mention, window sizes fluctuate, and if this is small telnet or shell based application, they will most likely never get to full size.

Reply to
Trendkill

Dear Friends

Thanks a lot for your enlightening response. I would like to further elaborate on my query, may be your can help me more

Age of Problem : 1.5 years

Working on Problem : whenever problem comes, it comes and goes, many teams are involved here

Server type : IIOP Database application

Client type : some desktop PCs connect directly, but most of them connect via thin servers which are also acting as TERMINAL Servers and also act as clients to the IIOP Database application server

Client means : A terminal server which is also a Thin Server to 100s of thin clients

I would like to clerify here is that we are talking about the BULK TCP Transfer between THIN SERVERS/TS (also acting as clients to the IIOP Database application) and the IIOP application server.

the reason of putting this post here is that it is always the NETWORK which is blamed first for slow application and we use all cisco networking devices. Multiple parallel WAN links are being load balancing using IP LOAD-SHARING PER-PACKET with IP CEF.

EXPERIENCE : I have a hard time blaming TCP windowing !!!

can you put some more light on it

Q : How many concurrent users? A : There are 6 THIN/TS Server, the concerned team has divided that at a time there are max 30 users logged in.So 30x6= 180 users

Q : Is this real time? A : Yes, it is real time transaction based use

Q : How do the queries look? A : a number is put and query is sought against it. At front a java based interface is opened, java jar compressed classes are being downloaded from IIOP down to TS/Thin servers.

Q : Are they efficient? A : How to find that out ? I believe that a certain transaction for 30 days takes about 12MB of data to be transferred which included screen/ graphics updates along with the real data but a transaction for a day or two should take 1 MB or less. I believe this data is very WAN UN- FRIENDLY but question is how to make it efficient ?

Q :What is slow and what is fast? A : if query output is displayed in 2-4 seconds, it is fast and ok but if it takes 15-30 seconds then it is mild and if it takes a minute or more, it is slow.

Q : What kind of bandwidth per user or per, transaction, and how many users/transactions at any given time? A : The six WAN links remain at 50% loading but at times the link use goes until 75%. Per user transaction and bandwidth need varies as some queries yield less output while other yield more output. Total BW transfer between the TS and IIOP server in one business day is 20GB. At a time, average two hundred users are onto it. If we assume 200 transactions for each user, then 200x200=40000 transactions and if each transaction on average is supposed to take 0.5-->1 MB, then it makes 40,000x.5=20000 MB.

Q : How much total data is transferred in a week between TS Servers and the IIOP Application Server ? A : 1 TERA BYTE

Q : 12Mbps is not that fast, but you need to provide context of whether or not 12Mbps is enough. Could be anything from server being busy with backups or some kind of schedule, to WAN pipe utilization going over 80% which would start to impact latency, to service provider, to anything ?

A : There are 6 WAN link each of 2MB (6xE1). Each of the link varies from 50% to 70% loading and at times it goes to 80%. But we have moved from 2xE1 to 6XE1 and the application is so Bandwidth hungry that even this BW does not seem enough. IIOP server is not in our domain so we cannot check. Yes WAN pipe touches 80% but we cannot provide more bandwidth than that and need to find the other way. TTL is also fine but I need to check the latency during 80% loading.

Q : Have you used MRTG or Netflow to gauge bandwidth utilization at these times? How about latency? A : Yes we use MRTG and Netflow and I have detailed traffic stats. Bandwidth sometimes goes up and the usage on a 2Mbps link varies from

1.5 to 1.75Mbps.

Q : Do you have a baseline of these usages and performance during 'good performance' times? A : It has been up and down, sometime complaint comes and ususally "no news is good news"

Q : Do you have QoS? A : No

Q : Could someone be running a FTP and killing your pipe? A : In the presence of Netflow I can alway catch the cluprit but that is not the case here.

Q : My bet is that the problem is the Terminal Server. 200 clients on a single Terminal Server is a lot even for non-database type applications. Are you also monitoring the performance of the TS? You need to monitor memory usage, CPU, disk I/O and network I/O, active clients, etc. Even if this is a big multi-cpu TS, you probably have a some type of I/O bottleneck on the server ?

A : Yes, sometime ago that was the case but later the users were split onto different TS and more resources were added, can you refer to SERVER SIZING URL where I can find performance parameters as you have mentioned, how can I find that there is an I/O bottleneck ? and how does it gets removed automatically ? TS/Thin Server team puts a regualr weekly reboot of these machines.

I agree with you that there are too many variables

TRAFFIC CAPTURE RESULTS ========================

I have captured the traffic and analyzed it. From three TS to IIOP application, in 1 min 9 seconds, only about 2 MB of request was sent and against that 65MB of data was pulled.

Capture Duration : 1 min 9 seconds

Client to Server Data : 2MB

Server to Client Data : 65 MB

Data Type : TCP

Frames caputred : 75000

Application : HTML, IIOP

MSS advertised by both : 1460byte

TOO MANY TCP RETRANSMISSION, DUPLICATE ACKS, FAST RETRANSMISSIONS etc

TCP window advertised by client : 65k TCP window advertised by server : 24k

In all 7500 frames, I saw the same TCP WINDOW SIZE, should it change ? is there anything wrong ? who controls window size ? I believe that the SEND TCP window size of SENDER (IIOP Application) and RECEIVE TCP window size at the TS Server which is fetching data from the IIOP Server should be same.

What you think from above data, is it same or different ? and if not what should be the optimal TCP WINDOW SIZE ?

As a work-around, I am suggesting that a CLUSTER TERMINAL SERVER be placed at the IIOP Application LAN so that huge amounts of data transfer only between two machines on the SAME LAN and only screen refreshes transfer over WAN, what you say ?

waiting for your valuable response...

Best Regards Cheema

Reply to
Cheema

Here is your details on TCP Windowing....better than me trying to make a 1 paragraph summary:

formatting link
As for your issue, it sounds like you may just have a simple issue of amount of clients and bandwidth. By 'efficient', I mean are the clients asking for all the data at once from the DB rather than a cell by cell query. If one client makes a request (or a few), and the server responds back with full packets (usually 1514 or whatever), until the query has been fulfilled, it is network 'efficient'. If the client is making hundreds of queries for each additional piece of information, the application needs to be looked at. Especially over a WAN with limited bandwidth and latency, this could kill you.

Assuming that is not your issue (nothing to prove it, just saying), you may just have some overloaded times. When the bandwidth is at 50% per pipe, is the performance good? When the performance is reported as bad, does the bandwidth show any clear differences, such as 80% utilization during these times. If so, it sounds like pure volume is your bottleneck. If not, and it happens when its at 50 and 80 alike, what else is going on in the network or one these boxes? TCP windowing is negotiated, and while a smaller window will not allow as much data, I still doubt this is your issue. Non optimized windowing would also affect ALL your traffic, not just traffic at certain times. Either it is negotiating properly or not, and it would not make sense that some transactions are 1-2 seconds, while others are over a minute. This is not a windowing problem. You need to focus on volume/usage of the network and these boxes during good and bad times, and see what correlations you can draw.

Reply to
Trendkill

summary:

formatting link

=============================================

Hi

Thanks for your valuable advice which seems to be blended with many years of related experience.I have skimmed through the details of the URL you provided and found it very useful.Yes it seems like a simple issue of number of clients and amount of bandwidth. I saw today that all 6 WAN links were completely chocked at a particular time

responds with full data as I saw many 1514 byte packets in the decode

I have been asking the IIOP application team for many months to look at the application but they are nagging their heads like dumb and deaf and always blaming network.

Exactly, the application is killing WAN even with a 24kB TCP SEND window size, I wonder what might happen to network when the window size would be increased

Please suggest if my below calculation is right

TCP Window size = BW x RTT = 10Mbps x 70msec = 100kB

Note : 10Mbps for 5xE1 links

so the SEND TCP window at the IIOP Server could be 1000kB instead of

24kB only.

However the reason which diverted my attention to TCP Windows sizing was that even when sometimes the WAN links were not chocked, there is still slow access complaint.

"you may just have some overloaded times"

quite true

When BW is at 50%, yep, there are no complaints but not always but mostly no complaints.

When links are above 80%, there is sometimes a delayed complaint and it becomes difficult to find exactly when the complaint was coming and what was the nature of the problem because when the complaint arrives after some hours or next day, I look into the WAN, and it looks fine and then a whole series of questions and lots of guess work has to be done to estimate the event and its impact

"It sounds like pure volume is your bottleneck"

I agree but the issue here is we cannot allocate any more bandwidth so a kind of constraint.

I am focused 100% now on volume and thinking of alternate way to reduce the application traffic transported on the network and what do you think of the idea (we are using this idea with great success for some other applications also)

"a CLUSTER TERMINAL SERVER be placed at the IIOP Application LAN so that huge amounts of data transfer only between two machines on the SAME LAN and only screen refreshes transfer over WAN"

Are you a cisco guy ? can I put you some later

Many thanks Cheema

Reply to
Cheema

summary:

formatting link

I am a 'cisco guy', yes. I support thousands of applications for a major financial institution. Windowing will not help you. The application will still negotiate, and if you are max bandwidth, windowing will not allow you to send any more or less traffic. Additionally, your wan has limited packet sizes based on frame relay or ATM or whatever you are using, and therefore, I am 99% sure that tcp windowing changes will not do anything to help your situation. If your issues are resulting from 80-90% WAN utilization, and everything else looks ok, then your only options are to move your database, increase your WAN, or decrease your load at any given time. I mean, if this is a pure database query application, which it seems is efficient based on your 1500 byte packet comments, then your response issues are a simple combination of size of query, over limited bandwidth, that is already constrained. Are these t1s bonded together so you have 1 12 meg pipe? While separate pipes may help separate users or traffic, 1 12 meg pipe should in theory allow better communications as the traffic can return at a higher rate. While it won't do too much when you have your max users like you do now, it may end up helping a bit. If you already have these bonded, its a function of what I already said above......you may not have a 'problem' per se here, but more of a simple constraint or 'ceiling'.

Maybe you could replicate this database to something local, and allow your users to hit that? Bottom line is that you will need to get your business/application resources to help think through options...but it doesn't sound like something is necessarily 'broken'.

Reply to
Trendkill

Can you elaborate on "thin/terminal server" ??????

Are we talking serial VT200 terminal (or terminal emulator) ? Do clients communicate with the server via Telnet ?

If so, this changes everything because data sent by the terminal to the server is character by character with echoing done by the server. this means lost of small packets (eg: lots of overhead for each character typed in).

In in terms of the server sending a screenful of information at the same time, it also depends on how this is done at the OS/TCP stack level. If the application does one IO per line to be displayed, and the OS sends each IO packet to the TCPIP destination without grouping multiple IOs, you would then also end up with many smaller packets being sent instead of a single 1460 byte payload to redraw the whole screen.

This is one of the reasons that DEC's LAT protocol would group keystrokes from multiple terminals into a single ethernet packet sent to the host to greatly reduce the load on the network.

Reply to
JF Mezei

summary:

formatting link

==============================================================

Hi

Got chance to your valuable comments after weekend.

Good to know that you are a cisco guy. I will put some more cisco related questions :-)

Can you help me and give some hints how a Networking guy can support applications on network, is it enough to have solid understanding of TCP and related mechanisms or there is more required e.g. to understand how JAVA works or how FTP and/or HTTP works. I am a sniffer certified and worked for 3 years in Dubai Internet City. I enjoy supporting these application and like to capture , decode, interpret and analyze the traffic. I also like all those fights between network guys and applications guys. I really enjoy when find them miserable when they have to really look into the applications.

On the other hand, network issues are no less.

"Windowing will not help you. The application will still negotiate, and if you are max bandwidth, windowing will not allow you to send any more or less traffic". I fully agree.

One question : do you think it is fine if the source SEND TCP window size is NOT equal to the destination RECEIVE TCP windows size ? or it should be same ?

Yes we have temporary chocking no much how much WAN media is provided and everytime I go in and look at NETFLOW, I find it IIOP application traffic all over.

As you have mentioned the OPTIONS here

1) move your database 2) increase your WAN 3) decrease your load at any given time

Since another team is involved in DB movement, and they are the DEAF, THE DUMB THE BLIND, my one year of motivational argument has not born fruit.

WAN is already running on MAX

for 3rd option, I am trying efforts to off-load traffic after doing netflow analysis

Qs : Are these t1s bonded together so you have 1 12 meg pipe? While separate pipes may help separate users or traffic, 1 12 meg pipe should in theory allow better communications as the traffic can return at a higher rate. While it won't do too much when you have your max users like you do now, it may end up helping a bit

Ans : There are two WAN hops, first hop has 7 E1 links and second hop has 6 E1 lines with an Ethernet segment in between. There is no MLPPP or any other kind of bundling. We are using OSPF based load balancing with CEF and ip load-sharing per-packet command on. All other applications are working ok

You are right, the CEILING effect is there.

"Maybe you could replicate this database to something local, and allow your users to hit that? Bottom line is that you will need to get your business/application resources to help think through options...but it doesn't sound like something is necessarily 'broken'. "

Very true

And I also agree there isn't something borken here.

Thanks and Best Regards Cheema

==================================================

Reply to
Cheema

=========================================================

Hi

Q : Can you elaborate on "thin/terminal server" ?????? A : A different team is taking care of it. But I believe the same thin Server for thin clients, is acting as Windows Terminal Server to which clients connect using TCP port 3389.

All thin clients have windows XP embedded into bios and they get only screen updates.

Thanks and Regards Cheema

Reply to
Cheema

A good network resource needs to know how the network runs, how it should run, and have a very good understanding of the network's customers. What applications, what are their demands and requirements, and how they play or don't play with each other. What I mean by that is in an enterprise network, you have hundred or thousands of applications. Some are WAN, some are LAN, some are real- time versus not. Some require large bandwidth and will soak up a pipe, others are more trickle-feed but may be more open to response issues because they are chatty.

In your case, I don't mean to tell you ignore TCP, but in my experience in 2 global fortune 100 companies, I have rarely had to go down that path. Just because the windows don't match, doesn't mean that something is wrong. Provided they negotiate, they will negotiate to the smaller window, and in most thin client applications, the traffic is so small it isn't reaching peak size anyway. Additionally, you don't really care about windowing as much as you case about efficiency of the application in regards to packets. Since you are seeing a query come in, and 1514 packets return, this should be efficient as it appears to be asking for bulks of data at once. Too many times have I seen applications that request 'ok give me the next cell in the table', and latency/bandwidth will kill you when your applications are not making efficient DB queries.

In your case, your WAN is definitely utilized, but is not full. 50% is busy, 70-80% are the beginnings of issues. I mean just as a rough exercise, each WAN pipe is 50% utilized or 90 of 180k. Yes I know a t1 is 192, but its generally not realistic given overhead. That means that your application has 90k to work within per session. Another issue that you may look into is OSPF load balancing. Generally it is done per session or per packet, and the default I believe is per session. If all your users are hitting a single terminal server, I'd be interested to see if it is actually load balancing the individual terminal server queries across your multiple pipes, or if it is considered a separate session. It should be the latter, but definitely something to watch. Anyway, and back to the point...you have 90k to share. Yes you may have multiple pipes, but a single session can only route over one. Which means if you have 30 users, in theory, 5 are on each of your 6 t1s, sharing 90k, which results in 15k per session. However bandwidth doesn't necessarily divide, a single session will probably take everything it can get, and other sessions will suffer. Even if it is sharing properly, 15k is not a lot if these queries return a meg or more of data. A 1 meg file at 15k will take over a minute.

You may want to consider looking at bonding some connections and inputing some QoS to protect important applications. Generally, any media stuff like VoIP/Video need to go into a top tier bucket w/ real time applications close behind (if not in the same bucket), web applications etc in a third bucket, and ftp/db replication in default.

You can definitely look into the tcp stuff, but I just don't see it being your issue based on my experience.

Reply to
Trendkill

==================

==============================================================

Hi

Thanks again for a very intuition full reply.

"A good network resource needs to know how the network runs, how it should run, and have a very good understanding of the network's customers. What applications, what are their demands and requirements, and how they play or don't play with each other. What I mean by that is in an enterprise network, you have hundred or thousands of applications. Some are WAN, some are LAN, some are real- time versus not. Some require large bandwidth and will soak up a pipe, others are more trickle-feed but may be more open to response issues because they are chatty."

You are absolutely right and this comes with a peculiar blend of study, experience and psychology

"Just because the windows don't match, doesn't mean that something is wrong. Provided they negotiate, they will negotiate to the smaller window, and in most thin client applications, the traffic is so small it isn't reaching peak size anyway"

I got the answer now that it is fine if the Source has a SEND window size not equal to the DESTINATION RECIEVE Window size.

"you don't really care about windowing as much as you case about efficiency of the application in regards to packets. Since you are seeing a query come in, and 1514 packets return, this should be efficient as it appears to be asking for bulks of data at once. Too many times have I seen applications that request 'ok give me the next cell in the table', and latency/bandwidth will kill you when your applications are not making efficient DB queries"

I agree with you sir that the application efficiency is the issue here and it is soaking up most of the bandwidth. On 12Mbps WAN media, a monthly data PULL is 1 TERA BYTE from one site only. Thanks to NETFLOW visibility and we are able to thwart the FATA BLOW which could be upon networking guys from application guys, it is vice versa now.

"In your case, your WAN is definitely utilized, but is not full. 50% is busy, 70-80% are the beginnings of issues. I mean just as a rough exercise, each WAN pipe is 50% utilized or 90 of 180k. Yes I know a t1 is 192, but its generally not realistic given overhead. That means that your application has 90k to work within per session"

I have recently closely observed is that two times in a day when shift changes, the CEILING effect is seen and all 6 WAN links touch the roof because of the IIOP application. one number DB queury takes about 1MB against 4-5 days records but takes 12MB and 59 seconds to 5 minutes to display against 30 days records.

"Another issue that you may look into is OSPF load balancing. Generally it is done per session or per packet, and the default I believe is per session. If all your users are hitting a single terminal server, I'd be interested to see if it is actually load balancing the individual terminal server queries across your multiple pipes, or if it is considered a separate session. It should be the latter, but definitely something to watch."

WFQ + CEF + per-packet load sharing :-( how does it all works ? I tried per-destination, but it is not suited at all and few of the 6 WAN links remain chocked while other remain empty. The three terminal servers along with their clients are placed on the same LAN while then these TERMINAL SERVERS have java compressed classes based interface installed for the IIOP application server placed on the other end of 6 pipes. We are pushing the application owners to provide TS on the IIOP application server LAN.

"Anyway, and back to the point...you have 90k to share. Yes you may have multiple pipes, but a single session can only route over one. Which means if you have 30 users, in theory, 5 are on each of your 6 t1s, sharing 90k, which results in 15k per session. However bandwidth doesn't necessarily divide, a single session will probably take everything it can get, and other sessions will suffer. Even if it is sharing properly, 15k is not a lot if these queries return a meg or more of data. A 1 meg file at 15k will take over a minute."

I think in per-packet load-balancing with OSPF+CEF, all the packets of all sessions are thrown out in round robin. A TS session takes 2k per second per user ?

"You may want to consider looking at bonding some connections and inputing some QoS to protect important applications. Generally, any media stuff like VoIP/Video need to go into a top tier bucket w/ real time applications close behind (if not in the same bucket), web applications etc in a third bucket, and ftp/db replication in default."

I agree with you. Right now the BIG FIGHT is on between teams and I will let you know the outcome :-)

"You can definitely look into the tcp stuff, but I just don't see it being your issue based on my experience. "

Right boss

Thanks and Best Regards Cheema

Reply to
Cheema

I am not an expert in Windows terminal server stuff. But if this is Microsoft's proprietary X-windows equivalent (display the GUI output on a remote device), then the network load is for far more than just the raw data, it is also for all the gui stuff. And every mouse movement also generates a packet back to the "terminal server", as do keystrokes.

Also, every picture/bitmap will need to be transfered at least once between the server and the thin machines. (depending on how smart Windows's remote access is).

In such a scenario, it isn't the size of the database record that counts, it is the amount of raw screen redrawing that counts.

If you want a good idea of what goes in, get one of those terminals to connect viw a 300baud dialup PPP connection. You will then see what functions are done locally on the terminal versus which ones are being drawn the hard way via data transfers.

Reply to
JF Mezei

===================================

Hi

Thanks for your valuable reply. Basically the issues is the BULK data transfer between the TERMINAL Servers and IIOP Application.

thin/TS clients ==> LAN ==>Thin Server/TS =====> 6 wan links ==> IIOP application

Each of the three TS are having an IIOP application client installed.

The issue lies when three TS and the IIOP application

Regards Cheema

Reply to
Cheema

=============================

Hi

I am highly glad to inform you that the issue is resolved.

Thanks for all your help

Bye Cheema

Reply to
Cheema

So what was the resolution ????

Reply to
John Agosta

====================================

Hi

Application team has applied a filter in display at client side computers. WAN links are not chocked and stay below 70% util mark on peak times especially during the shift time. There is 35-40% reduction in WAN traffic during the congestion window times. Main cause of congestion was the HTTP traffic associated with the database being accessed. I did the complete NETFLOW analysis and presented before the customer services and application teams and proved the point that application and the way application is being accessed should be looked into.

Application now works in more WAN friendly way

Regards Cheema

Reply to
Cheema

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.