flow/packet loss through L3 C3560, pings OK

I have a Catalyst 3560G that is doing L3 routing. I tried to use it as default gateway for a web cluster, which was doing about 120mbps of traffic, 5kpps each in and out. However, users noticed slow page loads, broken inline images, etc.

I was able to ping all the servers from outside the 3560G with zero packet loss in tens of thousands of 1500-byte pings. I moved the web cluster to a C6509 (same interface config) and the issue disappeared.

Web client experience was noticably impacted, so if it were simple packet loss, I think I would have seen it with ping. It seemed as though the issue was related either to the type of traffic (plain http) or flow (lots of flows).

The 3560 has a pretty vanilla config; the web cluster traffic was being routed between a "no switchport" interface and a Vlan interface. I did notice that the "no switchport" interface had "ip route-cache same-interface" configured, and I'm not sure why. Also, the 3560 is carrying about 7k external routes, but I monitor it to make sure it doesn't hit the limit. I didn't see any clues in syslog.

Phil

Reply to
Phil Begriffenfeldt
Loading thread data ...

we had some issues with the 10/100 versions with buffer tuning where we had problems with traffic bursts overwhelming the buffers, esp when you turn QoS on as you effectively reduce the buffer pool for any 1 QoS type by 75%.

If you have several GigE connected servers contending for a congested or rate limited port this could be an issue.

there are some commands to look at the buffers - something like show platform port-asic statistics..... you want the drop stats for any overloaded outbound ports.

you need "sdm prefer routing" in the config to handle lots of IP routes - if not they overflow the hardware forwarding table and get dealt with in software.

Reply to
stephen

Thanks for reminding me. I did set that last May (it's logged), and then power-cycled the switch, but I do not appear to have verified "show sdm" after the power cycle. Now I see that the switch is using default/desktop, which could be the source of my trouble. Weird.

Reply to
Phil Begriffenfeldt

yes - hardware forwarding tables will fill with 1 to 2k routes.

everything that arrives after the tables fill goes in software forwarding - so whether it is an irritation or a disaster depends on the order the routes arrive.

Not a fun thing to trouble shoot, but it does log an "out of space" message - shame Cisco couldnt make it obvious what it is an error about....

Reply to
stephen

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.