CSS 11050 random RSTs

- M
- Mike Gauthier
  
  Contact options for registered users
posted
16 years ago

Thu, Sep 6, 2007 12:15 AM

Hi all.

I've a pair of CSS 11050's balancing traffic for numerous TCP protocols (SMPP, some HTTP, etc.). Some of these protocols are "sticky" while others are not. There is a lot of traffic that move through this thing, so it's bit difficult to get a handle on exactly what's happening. So, I've narrowed my work down to a single client that is experiencing this problem.

My network looks like this (logically).

Internet | +------------+ +--------+ +-----+ | Border Rtr |--| Switch |--| PIX |------ +------------+ +--------+ +-----+ | .1 172.16.63.0/24 | |---------------------------------------------------| | .3 | .154 | .155 +-----------+ +-------+ +-------+ | CSS 11050 | | srv x | | srv y | +-----------+ +-------+ +-------+

The 172.16.63.x network is configured as such.

CSS 11050 (172.16.63.3) has its default route set to the PIX (172.16.63.1). The two servers being balanced have their default routes set to the CSS 11050 (172.16.63.3). I know this is an "interesting" set up, but I didn't design it. srv x and srv y are linux boxes (that shouldn't matter though).

The connection I'm troubleshooting involves a static one-to-one NAT on the PIX to a VIP on the CSS 11050. Let us say 10.1.1.50 is the public address that NATs to 172.16.63.191 (the VIP on the CSS 11050). Client A establishes a connection to 10.1.1.50 tcp port 4321. The protocol on the backend has no sense of state, so we have sticky balancing configured on the load balancer. So, this particular connection goes from client A to the load balancer VIP and then to srv x.

All seems to run fine until 30 - 45 minutes into the established session (the backend protocol itself has keepalives built in). Then, out of the blue, the load balancer - from the VIP in question - generates a RST that tears down the connection. This wouldn't normally be an issue, but some clients only connect when they have messages to send to us. Any messages we have for them will remain in queue until the re-establish the connection (silly I know - why doesn't the client just immediately re- establish).

For the life of me, I cannot figure out why the CSS 11050 generates these RSTs. I've captured traffic both from the backend srv x, the CSS 11050, and the PIX. This verified that the load balancer is indeed the one generating the RSTs (I don't see srv x generate a RST).

This random RST of course causes client A to send it's own RST. The client's RST of course makes it all the way to the backend server (srv x in this case) which starts a full end-to-end tear down of the connection.

Any help would be greatly appreciated especially considering this thing is no longer supported by Cisco. Some hardware details follow.

Thanks much.

MikeG

### sho ver Version: ap0500069 (5.00 Build 69) Flash (Locked): 4.00 Build 3 Flash (Operational): 5.03 Build 15 Type: PRIMARY Licensed Cmd Set(s): Standard Feature Set

### show chassis Configuration for CSS 11050:

Name: CSS 11050 SW Version: 5.00 Build 69 HW Major Version: 02 HW Minor Version: 0 Serial Number: 21330039163 Base Mac Address: 00-10-58-01-ef-eb

Module Number Module Name Status

1 FEM primary 5 SCFM-PLUS primary

Port Number Port Name Status

1 e1 online 2 e2 online 3 e3 online 4 e4 online 5 e5 online 6 e6 online 7 e7 online 8 e8 online

- T
- Thrill5
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 6, 2007 3:23 AM

I remember a similar problem from a long, long time ago, and can't remember the exact fix. I think the problem is that the CSS is running out of ports and you need to adjust the timeout value for the NAT translation table to a lower value.

Scott

- M
- Mike Gauthier
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Sep 6, 2007 5:07 AM

Interesting Thrill5. This actually makes a bit of sense. I'll have to look into it. Thanks much.

MikeG