Dropped packets via ISR

SETUP Cisco 2811 ISR on 4mbit metro ethernet. ~200 users

Problem Web pages are coming up either a) perfectly b) half mangled with some images and screwed up tables or c) not at all

There is no pattern as to when or why a page might not come up. Most of this happens at lunchtime while people sit at their desks and browse. More usage = more of this problem. Tried cranking up the bandwidth to

10mbit and still had the same problem.

Example: This is what a webpage might look like: http://129.21.125.13/~adam/gg/screenshot_corresponding_w_capture.JPG This is what the capture looks like: http://129.21.125.13/~adam/gg/msn_ss_part1.JPGhttp://129.21.125.13/~adam/gg/msn_ss_part2.JPG I realize that retransmissions are normal, and this is what a normal loss/retransmit should look like(taken from my home and office connection: http://129.21.125.13/~adam/gg/good_packet_loss.JPG Notice the 'Continuation packets' in the good packet loss image. I don't get those on the problem network.

I've troubleshot this down to the ISR and provider uplink. The provider from their NOC said that everythign on the line looked good and we don't have a ton of dropped packets on the ISR which one might expect. Any ideas? I've been working on this for weeks.

Thanks, Adam

Reply to
amattina
Loading thread data ...

What do you get from more basic tools:-

ping, traceroute.

The packet loss rate shown in the traces is not normal, i.e. about 1 in 10 or 20.

What I would do is:-

1 - check the interface stats on all network kit in the path that you can reach. Post the output here if you wish. sh int

2 - Use traceroute with a lot of repetitions for a long time to verify each hop in the path to the internet. Pingplotter makes this a breeze.

If you use pingplotter set the repition rate to 1sec and leave it running over a failure.

Post results.

3 - Check that you are not having an MTU issue I guess would be an idea.

If that does not lead anywhere then we can have a look at the Ethereal dumps. 'cos that takes more brainpower.

Reply to
Bod43

Bod, Thanks.

1&2. We have ICMP disabled through the ISR but ping and tracert from the actual ISR router have been good. Confirmed this with Cisco support. I might be on-site tomorrow so I might take out that 'feature' in the ruleset and test some things...

  1. I had another person mention MTU size. I tried changing the value but got this error:

% Interface FastEthernet0/1 does not support user settable mtu.

I have a pretty good feeling it might be the MTU size as this client is on a metro ethernet...the vlan info might be going over the pre-configured 1500 mtu size and thus creating the resets from the web servers. Any comments/ideas?

snipped-for-privacy@hotmail.co.uk wrote:

Reply to
amattina

Doesn't really look like MTU to me, without delving into the ethereal dumps. Could be I guess.

If ICMP is off then PMTUD will not work.

formatting link
Blindly turning off ICMP echo seems to me to be a /very bad idea/. I ALWAYS leave it on at least for selected hosts so that the troubleshooting tools that are part of the internet can be used.

ping

formatting link
ping
formatting link

Do they turn it off? Cisco do appear to rate limit it.

You probably have a duplex missmatch. As discussed sh int.

Look for input errors and output errors. Unexplained errors are almost certainly duplex mismatches.

Reply to
Bod43

Are you using CBAC ( inspect commands )?

Post show version and config

Reply to
Merv

Bod, FYI - I am inheriting this infrastructure/problem and configurations

1=2E BUT! I did allow through ICMP just now through the 2811 and did a ping to an external host, no problem...

Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D33ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D33ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55

2=2E I also used ping plotter...

Target Name:

formatting link
IP: 66.102.7.147 Date/Time: 8/24/2006 9:22:54 AM

1 1 ms 1 ms [10.1.1.4] 2 3 ms 2 ms host.static.twtelecom.net [XX.XX.XX.XXX) 3 3 ms 2 ms dist-01-ge-3-0-0-510.roch.twtelecom.net [66.192.240.144] 4 15 ms 15 ms core-01-so-5-1-0-0.chcg.twtelecom.net [66.192.244.62] 5 15 ms 16 ms peer-02-so-0-0-0-0.chcg.twtelecom.net [66.192.244.20] 6 15 ms 60 ms [66.192.252.90] 7 15 ms 15 ms [216.239.46.5] 8 84 ms 70 ms [66.249.95.215] 9 70 ms 70 ms [72.14.233.129] 10 -32764 ms 72 ms [216.239.49.54] 11 -32764 ms 71 ms [216.239.49.66] 12 69 ms 69 ms [66.102.7.147]

Had no problems using ping plotter against various hosts...

3=2E Tried changing from full duplex to half duplex to auto and that did not resolve anything. Running at half duplex gave me more 'continuation' packets and less reset packets but we still had the problem. If anything, half duplex slowed things down (as you woulde expect). Here is an email I just got from our provider...

"1) Interface MTU at 1500 is fine. There is no VLAN tagging occurring between your interface and mine, so MTU issues here are moot.

2) 10MB, Full-duplex

As a general FYI, 95% of our reported throughput/latency issues are fixed when configs related to #2 are corrected. Let us know if we can be of further assistance."

4=2E There are not too many errors on the interfaces ( > > > provider from their NOC said that everythign on the line looked good
Reply to
amattina
  1. you are using 12.4(5) which is a deferred image ( read junked) So you should move off this image

  1. Is there any particular reason for using IOS 12.4 ? If not I would downgrade to latest IOS 12.3 to see if your problem persists

  2. why is CEF disabled ? ( no ip cef )

  1. Disable console logging ( no logging console )

Reply to
Merv

Merv, Thanks. Cisco also just recomended that I enable ip cef. I have done that and done a few other things per their recs....

memory-size iomem 25

Made some rule modificati> 1. you are using 12.4(5) which is a deferred image ( read junked)

Reply to
amattina

Made some headway...just posting this for the benefit of any future people searching and if anyone has any further insight. It turns out after more search and discover through the router that the 'ip inspect' command seems to be at fault. What is happening is the max number of half open connections is reached and then traffic is dropped. See below:

-- ISR-ROC-001#show ip inspect statistics Packet inspection statistics [process switch:fast switch] tcp packets: [12898:471751] udp packets: [66884:247203] ftp packets: [149:0] Interfaces configured for inspection 1 Session creations since subsystem startup or last reset 18515 Current session counts (estab/half-open/terminating) [212:2:0] Maxever session counts (estab/half-open/terminating) [331:40:17] Last session created

00:00:00 Last statistic reset never Last session creation rate 436 Last half-open session total 2 Half-open session count or session creation rate exceeded

-- The key here is the last line obviously. Cisco is recommending that I up the max-incomplete value from the default(40) to 150. Any ideas or insight on this?

Thanks, Adam

Reply to
amattina

Yeah, I saw such problems with rising max-incomplete values from defaults to: one-minute (sampling period) thresholds are [10000:27000] connections max-incomplete sessions thresholds are [10000:27000]

But before this you should check how many active NAT translations you have while experiencing problems with web sites? I had a lot of active translations (about 3000), because I don't have pretty much outbound things (p2p, etc.) banned and maybe some worms are operating in the network and trying to access the Net which rises the number of active NAT translations.

B.R. Igor

Reply to
Igor Mamuzic

Fixed the problem. We were getting that: "Half-open session count or session creation rate exceeded" error message on ip inspect stats. Raised the one-minute max and min values above 500/400 respectively. Now we are all set.

Thanks, Adam

Reply to
amattina

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.