Dropped packets via ISR

- A
- amattina
  
  Contact options for registered users
posted
17 years ago

Wed, Aug 23, 2006 1:46 PM

SETUP Cisco 2811 ISR on 4mbit metro ethernet. ~200 users

Problem Web pages are coming up either a) perfectly b) half mangled with some images and screwed up tables or c) not at all

There is no pattern as to when or why a page might not come up. Most of this happens at lunchtime while people sit at their desks and browse. More usage = more of this problem. Tried cranking up the bandwidth to

10mbit and still had the same problem.

Example: This is what a webpage might look like: http://129.21.125.13/~adam/gg/screenshot_corresponding_w_capture.JPG This is what the capture looks like: http://129.21.125.13/~adam/gg/msn_ss_part1.JPGhttp://129.21.125.13/~adam/gg/msn_ss_part2.JPG I realize that retransmissions are normal, and this is what a normal loss/retransmit should look like(taken from my home and office connection: http://129.21.125.13/~adam/gg/good_packet_loss.JPG Notice the 'Continuation packets' in the good packet loss image. I don't get those on the problem network.

I've troubleshot this down to the ISR and provider uplink. The provider from their NOC said that everythign on the line looked good and we don't have a ton of dropped packets on the ISR which one might expect. Any ideas? I've been working on this for weeks.

Thanks, Adam

- B
- Bod43
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Aug 23, 2006 5:21 PM

What do you get from more basic tools:-

ping, traceroute.

The packet loss rate shown in the traces is not normal, i.e. about 1 in 10 or 20.

What I would do is:-

1 - check the interface stats on all network kit in the path that you can reach. Post the output here if you wish. sh int

2 - Use traceroute with a lot of repetitions for a long time to verify each hop in the path to the internet. Pingplotter makes this a breeze.

If you use pingplotter set the repition rate to 1sec and leave it running over a failure.

Post results.

3 - Check that you are not having an MTU issue I guess would be an idea.

If that does not lead anywhere then we can have a look at the Ethereal dumps. 'cos that takes more brainpower.

- A
- amattina
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 2:16 AM

Bod, Thanks.

1&2. We have ICMP disabled through the ISR but ping and tracert from the actual ISR router have been good. Confirmed this with Cisco support. I might be on-site tomorrow so I might take out that 'feature' in the ruleset and test some things...

I had another person mention MTU size. I tried changing the value but got this error:

% Interface FastEthernet0/1 does not support user settable mtu.

I have a pretty good feeling it might be the MTU size as this client is on a metro ethernet...the vlan info might be going over the pre-configured 1500 mtu size and thus creating the resets from the web servers. Any comments/ideas?

snipped-for-privacy@hotmail.co.uk wrote:

- B
- Bod43
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 10:53 AM

Doesn't really look like MTU to me, without delving into the ethereal dumps. Could be I guess.

If ICMP is off then PMTUD will not work.

formatting link

Blindly turning off ICMP echo seems to me to be a /very bad idea/. I ALWAYS leave it on at least for selected hosts so that the troubleshooting tools that are part of the internet can be used.

ping

formatting link

ping

formatting link

Do they turn it off? Cisco do appear to rate limit it.

You probably have a duplex missmatch. As discussed sh int.

Look for input errors and output errors. Unexplained errors are almost certainly duplex mismatches.

- M
- Merv
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 12:34 PM

Are you using CBAC ( inspect commands )?

Post show version and config

- A
- amattina
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 1:48 PM

Bod, FYI - I am inheriting this infrastructure/problem and configurations

1=2E BUT! I did allow through ICMP just now through the 2811 and did a ping to an external host, no problem...

Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D33ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D33ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55

2=2E I also used ping plotter...

Target Name:

formatting link

IP: 66.102.7.147 Date/Time: 8/24/2006 9:22:54 AM

1 1 ms 1 ms [10.1.1.4] 2 3 ms 2 ms host.static.twtelecom.net [XX.XX.XX.XXX) 3 3 ms 2 ms dist-01-ge-3-0-0-510.roch.twtelecom.net [66.192.240.144] 4 15 ms 15 ms core-01-so-5-1-0-0.chcg.twtelecom.net [66.192.244.62] 5 15 ms 16 ms peer-02-so-0-0-0-0.chcg.twtelecom.net [66.192.244.20] 6 15 ms 60 ms [66.192.252.90] 7 15 ms 15 ms [216.239.46.5] 8 84 ms 70 ms [66.249.95.215] 9 70 ms 70 ms [72.14.233.129] 10 -32764 ms 72 ms [216.239.49.54] 11 -32764 ms 71 ms [216.239.49.66] 12 69 ms 69 ms [66.102.7.147]

Had no problems using ping plotter against various hosts...

3=2E Tried changing from full duplex to half duplex to auto and that did not resolve anything. Running at half duplex gave me more 'continuation' packets and less reset packets but we still had the problem. If anything, half duplex slowed things down (as you woulde expect). Here is an email I just got from our provider...

"1) Interface MTU at 1500 is fine. There is no VLAN tagging occurring between your interface and mine, so MTU issues here are moot.

2) 10MB, Full-duplex

As a general FYI, 95% of our reported throughput/latency issues are fixed when configs related to #2 are corrected. Let us know if we can be of further assistance."

4=2E There are not too many errors on the interfaces ( > > > provider from their NOC said that everythign on the line looked good

- M
- Merv
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 6:44 PM

you are using 12.4(5) which is a deferred image ( read junked) So you should move off this image

Is there any particular reason for using IOS 12.4 ? If not I would downgrade to latest IOS 12.3 to see if your problem persists
why is CEF disabled ? ( no ip cef )

Disable console logging ( no logging console )

- A
- amattina
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Aug 25, 2006 12:28 PM

Merv, Thanks. Cisco also just recomended that I enable ip cef. I have done that and done a few other things per their recs....

memory-size iomem 25

Made some rule modificati> 1. you are using 12.4(5) which is a deferred image ( read junked)

- A
- amattina
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Aug 29, 2006 3:10 PM

Made some headway...just posting this for the benefit of any future people searching and if anyone has any further insight. It turns out after more search and discover through the router that the 'ip inspect' command seems to be at fault. What is happening is the max number of half open connections is reached and then traffic is dropped. See below:

-- ISR-ROC-001#show ip inspect statistics Packet inspection statistics [process switch:fast switch] tcp packets: [12898:471751] udp packets: [66884:247203] ftp packets: [149:0] Interfaces configured for inspection 1 Session creations since subsystem startup or last reset 18515 Current session counts (estab/half-open/terminating) [212:2:0] Maxever session counts (estab/half-open/terminating) [331:40:17] Last session created

00:00:00 Last statistic reset never Last session creation rate 436 Last half-open session total 2 Half-open session count or session creation rate exceeded

-- The key here is the last line obviously. Cisco is recommending that I up the max-incomplete value from the default(40) to 150. Any ideas or insight on this?

Thanks, Adam

- I
- Igor Mamuzic
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Aug 30, 2006 8:49 AM

Yeah, I saw such problems with rising max-incomplete values from defaults to: one-minute (sampling period) thresholds are [10000:27000] connections max-incomplete sessions thresholds are [10000:27000]

But before this you should check how many active NAT translations you have while experiencing problems with web sites? I had a lot of active translations (about 3000), because I don't have pretty much outbound things (p2p, etc.) banned and maybe some worms are operating in the network and trying to access the Net which rises the number of active NAT translations.

B.R. Igor

- A
- amattina
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Aug 30, 2006 5:07 PM

Fixed the problem. We were getting that: "Half-open session count or session creation rate exceeded" error message on ip inspect stats. Raised the one-minute max and min values above 500/400 respectively. Now we are all set.

Thanks, Adam