i have a Cisco 6509 with Sup720 running IOS (tm) s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(18)SXE4. In the last 5 days i've discovered a strange problem. The cpu load raised from 5 - 10% to
50-90% at interrupt level. The traffic hasn't changed so much. If i run a clear arp-cache the load suddenly drops down to 5 - 10% for a short time (max. 1 hour). With the growing of the arp table above 8000 entries the load raises up to 90%. What could be a reason for that? As far as i know the Cisco 6500 is able to handle more than 64k of arp entries.
c6500#sh ip arp sum
9136 IP ARP entries, with 532 of them incomplete
If i change the arp timeout to 60 seconds for example everything works fine.
All portcards are equipped with the DFC modules. The Sup720 is equipped with a PFC3B card.
Hopefully someone of you has a hint or a solution for me. Thank you in advance. If you need additional information please let me know.
Your 6500 could be the victim of an ARP DOS attack.
Becasue ARPs are broadcast traffic they are very easy to catpure using a sniffer (Etherreal) to see if there is a high volume of ARPs being sourced from the same MAC address.
What is particular difficult to track down is DOS programs that change the source MAC address.
Thanks for your reply, unfortunately this doesn't seem to be the reason. The most arp requests (around 90%) are sourced by the 6500's vlan interface. The other requests are sourced by some servers in the datacenter, but very less requests.
At first sight this appears to be unrelated to ARP. The only thing that the device does at Interrupt Level (as I understand it) is "Fast Switch" packets.
I think that normally a 6500 SE 720 does CEF switching (routing that is) in Hardware however some packets can not be hardware switched and in that case the packets are passed to the Processor to be routed.
I would proceed as for CPU load troubleshooting on a traditional router.
sh int switching sh int statistics
Do you have 8000 nodes in your network?
As Merv says consider a DOS attack.
I am not saying that the issue is nothing to do with the
8000 ARP entries just that I doubt that the traffic that is causing the high CPU is ARP traffic.
Could the large number of ARP entries be related to proxy ARP?
Post more detail. ARP tables, what are your address ranges?
I wonder if you can find some way to examine the software routing process's routing-cache?
e.g. Can you turn off software CEF but leave hardware CEF on and then do sh ip route-cache
This would be an out of hours job 'cos it could all melt.
The 6500 will ARP for packets it is trying to deliver onto a VLAN segment.
An IP scanner running thru your subnets would generate that type of traffic and result in large number of incompletes if you IP subnet are sparesely populated.
The other thing you might consider doing is enabling NETFLOW accounting on each of the VLAN interfaces so that you could see if the source of packet being sent to non-existing destionation IP address in each of the VLAN IP subnets
c6500#show processes cpu | ex 0.00% CPU utilization for five seconds: 54%/50%; one minute: 57%; five minutes: 58% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 8 3848836 3499370 1099 0.71% 1.17% 1.02% 0 ARP Input 118 4563888 11286276 404 1.43% 1.87% 2.01% 0 IP Input 159 826800 40730 20299 0.31% 0.42% 0.42% 0 Adj Manager 164 813396 81268 10008 0.79% 0.36% 0.32% 0 IPC LC Message H 167 339552 353729 959 0.31% 0.26% 0.24% 0 CEF process 245 183132 364740 502 0.15% 0.04% 0.01% 0 RPC c6k_rp_envir 262 1078256 7232968 149 0.71% 0.33% 0.34% 0 Port manager per
We have about 2000 servers with around 8000 assigned IP adresses.
c6500#sh ip arp sum
7991 IP ARP entries, with 775 of them incomplete
If i clear the arp cache, the cpu load falls down to 5% - 10% for a short time. No real idea. I've scanned the network for attacks, but was unable to find something suspicious. Proxy-ARP ist disabled on all interfaces.
It may be that the 6500 continues to try to resolve incomplete entries. If that is the case then that would explain the lower CPU usage after you clear the ARP cache.
If this is the case then large number of incompletes would be quite bad...
May 14 22:50:53: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt May 14 22:51:09: %MLS_STAT-SP-4-IP_CSUM_ERR: IP checksum errors May 14 22:51:53: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt May 14 22:52:54: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt May 14 22:53:56: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt
Google search doesn't provide me with usefull information to this error messages. Maybe the large number of incompletes is the problem, but then it would be very hard to solve that (except of static arp usage?).
The unreachables option is already set:
no ip redirects no ip unreachables no ip proxy-arp
by the looks of the L2 and L3 counters values, most of you traffic is being handled by CPU. It would be useful if you cleared the counters first and then reposted the result.
Do the ARP incompletes occur accross all of the VLANs or just a few.
In order to find the source IP address of the packet that are resulting in ARP incomplete, you could define an extended access list to permit the destination addresses of some of the incomplete with VLAN1 destiantion IP addresses and then permit any any. You can then use the ACL matching counters to tell you if a particular VLAN is the source of the traffic. This ACL would be applied on all or some of the VLAN interface other than VLAN1.
Something like:
ip access-list extended ARP_SOURCE permit ip any host x.x.x.x ! where x.x.x.x is one of incomplete ARP permit ip any any exit
show access-list ARP_SOURCE ! check match counters
I am not up to date with the sup720 architecture. Would you mind explaining the above please?
Does the SUP720 use MLS?
I had assumed that it was like the 4500 SUP 4/5 which I think of as "Hardware CEF". There is no MLS, no "first packet" just wire rate forwarding on loads of ports:-) at layer 3 (or 2 clearly).
Also:- Back to the OP's issue.
I am having trouble with the idea that the root cause of this is ARP related. Sure it may be a symptom, of some inappropriate network or end station behaviour but as I see it the 50% Interrupt Level CPU must be a consequence of switching actual traffic and not anything directly related to ARP packets being processed by the box.
I agree that is is a good idea to track down the sources of the incomplete ARP entries.
Total Packets Bridged : 19461984156 Total Packets FIB Switched : 147452107429 Total Packets ACL Routed : 0 Total Packets Netflow Switched : 0 Total Mcast Packets Switched/Routed : 26251871 Total ip packets with TOS changed : 8256444465 Total ip packets with COS changed : 978 Total non ip packets COS changed : 0 Total packets dropped by ACL : 2049762 Total packets dropped by Policing : 12213434
Errors MAC/IP length inconsistencies : 0 Short IP packets received : 0 IP header checksum errors : 304618 TTL failures : 1914145 MTU failures : 0
Total packets L3 Switched by all Modules: 167068616897 @ 328278 pps
Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.