SETUP Cisco 2811 ISR on 4mbit metro ethernet. ~200 users
Problem Web pages are coming up either a) perfectly b) half mangled with some images and screwed up tables or c) not at all
There is no pattern as to when or why a page might not come up. Most of this happens at lunchtime while people sit at their desks and browse. More usage = more of this problem. Tried cranking up the bandwidth to
10mbit and still had the same problem.
Example: This is what a webpage might look like: http://129.21.125.13/~adam/gg/screenshot_corresponding_w_capture.JPG This is what the capture looks like: http://129.21.125.13/~adam/gg/msn_ss_part1.JPGhttp://129.21.125.13/~adam/gg/msn_ss_part2.JPG I realize that retransmissions are normal, and this is what a normal loss/retransmit should look like(taken from my home and office connection: http://129.21.125.13/~adam/gg/good_packet_loss.JPG Notice the 'Continuation packets' in the good packet loss image. I don't get those on the problem network.
I've troubleshot this down to the ISR and provider uplink. The provider from their NOC said that everythign on the line looked good and we don't have a ton of dropped packets on the ISR which one might expect. Any ideas? I've been working on this for weeks.
1&2. We have ICMP disabled through the ISR but ping and tracert from the actual ISR router have been good. Confirmed this with Cisco support. I might be on-site tomorrow so I might take out that 'feature' in the ruleset and test some things...
I had another person mention MTU size. I tried changing the value but got this error:
% Interface FastEthernet0/1 does not support user settable mtu.
I have a pretty good feeling it might be the MTU size as this client is on a metro ethernet...the vlan info might be going over the pre-configured 1500 mtu size and thus creating the resets from the web servers. Any comments/ideas?
Doesn't really look like MTU to me, without delving into the ethereal dumps. Could be I guess.
If ICMP is off then PMTUD will not work.
formatting link
Blindly turning off ICMP echo seems to me to be a /very bad idea/. I ALWAYS leave it on at least for selected hosts so that the troubleshooting tools that are part of the internet can be used.
ping
formatting link
ping
formatting link
Do they turn it off? Cisco do appear to rate limit it.
You probably have a duplex missmatch. As discussed sh int.
Look for input errors and output errors. Unexplained errors are almost certainly duplex mismatches.
Bod, FYI - I am inheriting this infrastructure/problem and configurations
1=2E BUT! I did allow through ICMP just now through the 2811 and did a ping to an external host, no problem...
Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D33ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D33ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55 Reply from 129.21.125.13: bytes=3D32 time=3D32ms TTL=3D55
2=2E I also used ping plotter...
Target Name:
formatting link
IP: 66.102.7.147 Date/Time: 8/24/2006 9:22:54 AM
1 1 ms 1 ms [10.1.1.4] 2 3 ms 2 ms host.static.twtelecom.net [XX.XX.XX.XXX) 3 3 ms 2 ms dist-01-ge-3-0-0-510.roch.twtelecom.net [66.192.240.144] 4 15 ms 15 ms core-01-so-5-1-0-0.chcg.twtelecom.net [66.192.244.62] 5 15 ms 16 ms peer-02-so-0-0-0-0.chcg.twtelecom.net [66.192.244.20] 6 15 ms 60 ms [66.192.252.90] 7 15 ms 15 ms [216.239.46.5] 8 84 ms 70 ms [66.249.95.215] 9 70 ms 70 ms [72.14.233.129]
10 -32764 ms 72 ms [216.239.49.54]
11 -32764 ms 71 ms [216.239.49.66]
12 69 ms 69 ms [66.102.7.147]
Had no problems using ping plotter against various hosts...
3=2E Tried changing from full duplex to half duplex to auto and that did not resolve anything. Running at half duplex gave me more 'continuation' packets and less reset packets but we still had the problem. If anything, half duplex slowed things down (as you woulde expect). Here is an email I just got from our provider...
"1) Interface MTU at 1500 is fine. There is no VLAN tagging occurring between your interface and mine, so MTU issues here are moot.
2) 10MB, Full-duplex
As a general FYI, 95% of our reported throughput/latency issues are fixed when configs related to #2 are corrected. Let us know if we can be of further assistance."
4=2E There are not too many errors on the interfaces ( > > > provider from their NOC said that everythign on the line looked good
Made some headway...just posting this for the benefit of any future people searching and if anyone has any further insight. It turns out after more search and discover through the router that the 'ip inspect' command seems to be at fault. What is happening is the max number of half open connections is reached and then traffic is dropped. See below:
-- ISR-ROC-001#show ip inspect statistics Packet inspection statistics [process switch:fast switch] tcp packets: [12898:471751] udp packets: [66884:247203] ftp packets: [149:0] Interfaces configured for inspection 1 Session creations since subsystem startup or last reset 18515 Current session counts (estab/half-open/terminating) [212:2:0] Maxever session counts (estab/half-open/terminating) [331:40:17] Last session created
00:00:00 Last statistic reset never Last session creation rate 436 Last half-open session total 2 Half-open session count or session creation rate exceeded
-- The key here is the last line obviously. Cisco is recommending that I up the max-incomplete value from the default(40) to 150. Any ideas or insight on this?
Yeah, I saw such problems with rising max-incomplete values from defaults to: one-minute (sampling period) thresholds are [10000:27000] connections max-incomplete sessions thresholds are [10000:27000]
But before this you should check how many active NAT translations you have while experiencing problems with web sites? I had a lot of active translations (about 3000), because I don't have pretty much outbound things (p2p, etc.) banned and maybe some worms are operating in the network and trying to access the Net which rises the number of active NAT translations.
Fixed the problem. We were getting that: "Half-open session count or session creation rate exceeded" error message on ip inspect stats. Raised the one-minute max and min values above 500/400 respectively. Now we are all set.
Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.