C6500 High Interrupt Load caused by ARP

H

Holger Amberg 20 years ago

Hello,

i have a Cisco 6509 with Sup720 running IOS (tm) s72033_rp Software (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(18)SXE4. In the last 5 days i've discovered a strange problem. The cpu load raised from 5 - 10% to

50-90% at interrupt level. The traffic hasn't changed so much. If i run a clear arp-cache the load suddenly drops down to 5 - 10% for a short time (max. 1 hour). With the growing of the arp table above 8000 entries the load raises up to 90%. What could be a reason for that? As far as i know the Cisco 6500 is able to handle more than 64k of arp entries.

c6500#sh ip arp sum

9136 IP ARP entries, with 532 of them incomplete

If i change the arp timeout to 60 seconds for example everything works fine.

All portcards are equipped with the DFC modules. The Sup720 is equipped with a PFC3B card.

Hopefully someone of you has a hint or a solution for me. Thank you in advance. If you need additional information please let me know.

Best regards,

Holger Amberg

Vote

M

Merv 20 years ago

Your 6500 could be the victim of an ARP DOS attack.

Becasue ARPs are broadcast traffic they are very easy to catpure using a sniffer (Etherreal) to see if there is a high volume of ARPs being sourced from the same MAC address.

What is particular difficult to track down is DOS programs that change the source MAC address.

Vote

H

Holger Amberg 20 years ago

Hello,

Thanks for your reply, unfortunately this doesn't seem to be the reason. The most arp requests (around 90%) are sourced by the 6500's vlan interface. The other requests are sourced by some servers in the datacenter, but very less requests.

Best regards,

Holger Amberg

Vote

A

anybody43 20 years ago

At first sight this appears to be unrelated to ARP. The only thing that the device does at Interrupt Level (as I understand it) is "Fast Switch" packets.

I think that normally a 6500 SE 720 does CEF switching (routing that is) in Hardware however some packets can not be hardware switched and in that case the packets are passed to the Processor to be routed.

I would proceed as for CPU load troubleshooting on a traditional router.

sh int switching sh int statistics

Do you have 8000 nodes in your network?

As Merv says consider a DOS attack.

I am not saying that the issue is nothing to do with the

8000 ARP entries just that I doubt that the traffic that is causing the high CPU is ARP traffic.

Could the large number of ARP entries be related to proxy ARP?

Post more detail. ARP tables, what are your address ranges?

I wonder if you can find some way to examine the software routing process's routing-cache?

e.g. Can you turn off software CEF but leave hardware CEF on and then do sh ip route-cache

This would be an out of hours job 'cos it could all melt.

Vote

M

Merv 20 years ago

The 6500 will ARP for packets it is trying to deliver onto a VLAN segment.

An IP scanner running thru your subnets would generate that type of traffic and result in large number of incompletes if you IP subnet are sparesely populated.

Vote

J

jay 20 years ago

Hi,

What does 'show int stats' show ?

#show int stats Vlan1 Switching path Pkts In Chars In Pkts Out Chars Out Processor 24483518 1744541863 13997696 1450102892 Route cache 273 44304 770707 173521523 Distributed cache 0 0 0 0 Total 24483791 1744586167 14768403 1623624415 Vlan2 Switching path Pkts In Chars In Pkts Out Chars Out Processor 69662642 457846948 30465060 435446238 Route cache 78161980 1497793252 100308501 1929661234 Distributed cache 0 0 0 0 Total 147824622 1955640200 130773561 2365107472 Vlan3 Switching path Pkts In Chars In Pkts Out Chars Out Processor 6879745 358345705 22815 2486835 Route cache 0 0 0 0 Distributed cache 0 0 0 0 Total 6879745 358345705 22815 2486835 Vlan4 Switching path Pkts In Chars In Pkts Out Chars Out Processor 6879166 358286507 22815 2486835 Route cache 0 0 0 0 Distributed cache 0 0 0 0 Total 6879166 358286507 22815 2486835

Is it actually ARP input ?

#show processes cpu | ex 0.00% CPU utilization for five seconds: 0%/0%; one minute: 1%; five minutes:

0% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 9 9879256 29456820 335 0.08% 0.03% 0.05% 0 ARP Input 31 19741468 442585601 44 0.24% 0.25% 0.16% 0 IP Input

Vote

M

Merv 20 years ago

clear interface counters, wait 5 minutes, then capture and post the output for "sh int acc" for vlan 1 thru vlan 4 interfaces

Vote

M

Merv 20 years ago

The other thing you might consider doing is enabling NETFLOW accounting on each of the VLAN interfaces so that you could see if the source of packet being sent to non-existing destionation IP address in each of the VLAN IP subnets

Vote

H

Holger Amberg 20 years ago

Hi,

below the needed data:

c6500#sh int vlan1 acc Vlan1 VLAN1 Protocol Pkts In Chars In Pkts Out Chars Out Other 42986 4318002 0 0 IP 67839362074 32993319263061 67276120863

32858452676179 DEC MOP 330 25410 330 42570 ARP 5293273 318627286 37479315 4197683280

c6500#show processes cpu | ex 0.00% CPU utilization for five seconds: 54%/50%; one minute: 57%; five minutes: 58% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 8 3848836 3499370 1099 0.71% 1.17% 1.02% 0 ARP Input 118 4563888 11286276 404 1.43% 1.87% 2.01% 0 IP Input 159 826800 40730 20299 0.31% 0.42% 0.42% 0 Adj Manager 164 813396 81268 10008 0.79% 0.36% 0.32% 0 IPC LC Message H 167 339552 353729 959 0.31% 0.26% 0.24% 0 CEF process 245 183132 364740 502 0.15% 0.04% 0.01% 0 RPC c6k_rp_envir 262 1078256 7232968 149 0.71% 0.33% 0.34% 0 Port manager per

We have about 2000 servers with around 8000 assigned IP adresses.

c6500#sh ip arp sum

7991 IP ARP entries, with 775 of them incomplete

If i clear the arp cache, the cpu load falls down to 5% - 10% for a short time. No real idea. I've scanned the network for attacks, but was unable to find something suspicious. Proxy-ARP ist disabled on all interfaces.

Best regards,

Holger Amberg

Vote

M

Merv 20 years ago

Put a sniifer on a port that is in VLAN1 and capture ARP request and replies to see if there is any pattern to the ARP requests.

If for examplke the destination address increments in each ARP requests, then thos requests would be consider very suspicious

Also is the encapsulation failure counter seen in show ip traffic rapidly incrmenting ?

Vote

M

Merv 20 years ago

Please post the output for "show int vlan 1" so we can see how many packets are being switched at layer 2 and how many at layer 3

Vote

M

Merv 20 years ago

Until you find the cause of your issue, you might want to disable ICMP unreachable on the VLAN interfaces

Vote

M

Merv 20 years ago

It may be that the 6500 continues to try to resolve incomplete entries. If that is the case then that would explain the lower CPU usage after you clear the ARP cache.

If this is the case then large number of incompletes would be quite bad...

Vote

H

Holger Amberg 20 years ago

Hello,

Merv schrieb:

Attached the full output:

c6500#sh int vlan1 Vlan1 is up, line protocol is up Hardware is EtherSVI, address is 0015.2ccb.3a00 (bia 0015.2ccb.3a00) Description: VLAN1 Internet address is XXX.XX.XXX.XXX/24 MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec, reliability 255/255, txload 29/255, rxload 29/255 Encapsulation ARPA, loopback not set ARP type: ARPA, ARP Timeout 01:00:00 Last input 00:00:00, output 00:00:00, output hang never Last clearing of "show interface" counters never Input queue: 1/4096/342883/342883 (size/max/drops/flushes); Total output drops : 0 Queueing strategy: fifo Output queue: 0/4096 (size/max) 5 minute input rate 1143780000 bits/sec, 322918 packets/sec 5 minute output rate 1148701000 bits/sec, 324610 packets/sec L2 Switched: ucast: 5032784627 pkt, 3875091409210 bytes - mcast:

2690022 pkt, 334426979 bytes L3 in Switched: ucast: 117782279980 pkt, 57251998773550 bytes - mcast: 0 pkt, 0 bytes mcast L3 out Switched: ucast: 117777279685 pkt, 57260176366988 bytes mcast: 0 pkt, 0 bytes 120346959816 packets input, 59364744882436 bytes, 0 no buffer Received 2259513 broadcasts (367382 IP multicast) 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 119435831020 packets output, 59176291086168 bytes, 0 underruns 0 output errors, 0 interface resets 0 output buffer failures, 0 output buffers swapped out

Today i found some new log entries:

May 14 22:50:53: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt May 14 22:51:09: %MLS_STAT-SP-4-IP_CSUM_ERR: IP checksum errors May 14 22:51:53: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt May 14 22:52:54: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt May 14 22:53:56: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt

Google search doesn't provide me with usefull information to this error messages. Maybe the large number of incompletes is the problem, but then it would be very hard to solve that (except of static arp usage?).

The unreachables option is already set:

no ip redirects no ip unreachables no ip proxy-arp

Best regards,

Holger Amberg

Vote

M

Merv 20 years ago

by the looks of the L2 and L3 counters values, most of you traffic is being handled by CPU. It would be useful if you cleared the counters first and then reposted the result.

Do the ARP incompletes occur accross all of the VLANs or just a few.

In order to find the source IP address of the packet that are resulting in ARP incomplete, you could define an extended access list to permit the destination addresses of some of the incomplete with VLAN1 destiantion IP addresses and then permit any any. You can then use the ACL matching counters to tell you if a particular VLAN is the source of the traffic. This ACL would be applied on all or some of the VLAN interface other than VLAN1.

Something like:

ip access-list extended ARP_SOURCE permit ip any host x.x.x.x ! where x.x.x.x is one of incomplete ARP permit ip any any exit

show access-list ARP_SOURCE ! check match counters

Vote

M

Merv 20 years ago

Also what does "show mls statistics" display ?

Vote

A

anybody43 20 years ago

Merv said:-

Merv,

I am not up to date with the sup720 architecture. Would you mind explaining the above please?

Does the SUP720 use MLS?

I had assumed that it was like the 4500 SUP 4/5 which I think of as "Hardware CEF". There is no MLS, no "first packet" just wire rate forwarding on loads of ports:-) at layer 3 (or 2 clearly).

Also:- Back to the OP's issue.

I am having trouble with the idea that the root cause of this is ARP related. Sure it may be a symptom, of some inappropriate network or end station behaviour but as I see it the 50% Interrupt Level CPU must be a consequence of switching actual traffic and not anything directly related to ARP packets being processed by the box.

I agree that is is a good idea to track down the sources of the incomplete ARP entries.

I have:-

sw1#sh ip arp sum

414 IP ARP entries, with 3 of them incomplete

sw2#sh ip arp sum

421 IP ARP entries, with 5 of them incomplete

1% vs the OP's 10%.

Thanks.

Vote

M

Merv 20 years ago

Cisco reference doc: "Catalyst 6500/6000 Switch High CPU Utilization"

formatting link

refer to show interface within this document

Vote

H

Holger Amberg 20 years ago

Hello,

Merv schrieb:

Attached the cleared statistics:

c6500#sh int vla1 Vlan1 is up, line protocol is up Hardware is EtherSVI, address is 0015.2ccb.3a00 (bia 0015.2ccb.3a00) Description: VLAN1 Internet address is 193.22.254.200/24 MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec, reliability 255/255, txload 30/255, rxload 31/255 Encapsulation ARPA, loopback not set ARP type: ARPA, ARP Timeout 01:00:00 Last input 00:00:00, output 00:00:00, output hang never Last clearing of "show interface" counters 00:05:01 Input queue: 3/4096/0/0 (size/max/drops/flushes); Total output drops: 0 Queueing strategy: fifo Output queue: 0/4096 (size/max) 5 minute input rate 1224177000 bits/sec, 251822 packets/sec 5 minute output rate 1202331000 bits/sec, 246864 packets/sec L2 Switched: ucast: 17856288 pkt, 19350134534 bytes - mcast: 2057 pkt, 251767 bytes L3 in Switched: ucast: 60307951 pkt, 28623964763 bytes - mcast: 0 pkt, 0 bytes mcast L3 out Switched: ucast: 60287765 pkt, 28625838855 bytes mcast: 0 pkt,

0 bytes 76192988 packets input, 46145818370 bytes, 0 no buffer Received 1685 broadcasts (499185 IP multicast) 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 77917013 packets output, 46823199043 bytes, 0 underruns 0 output errors, 0 interface resets 0 output buffer failures, 0 output buffers swapped out

At the moment we have only the default vlan1.

c6500#show mls statistics

Statistics for Earl in Module 5

L2 Forwarding Engine Total packets Switched : 167069132363

L3 Forwarding Engine Total packets L3 Switched : 167068616897 @ 328278 pps

Total Packets Bridged : 19461984156 Total Packets FIB Switched : 147452107429 Total Packets ACL Routed : 0 Total Packets Netflow Switched : 0 Total Mcast Packets Switched/Routed : 26251871 Total ip packets with TOS changed : 8256444465 Total ip packets with COS changed : 978 Total non ip packets COS changed : 0 Total packets dropped by ACL : 2049762 Total packets dropped by Policing : 12213434

Errors MAC/IP length inconsistencies : 0 Short IP packets received : 0 IP header checksum errors : 304618 TTL failures : 1914145 MTU failures : 0

Total packets L3 Switched by all Modules: 167068616897 @ 328278 pps

Best regards,

Holger Amberg

Vote

M

Merv 20 years ago

I believe you said the ARP were being sourced from VLAN1 which implies that there is more than one VLAN

What else does this switch connect to - ie what is the network topology

Vote

C6500 High Interrupt Load caused by ARP

Join the Discussion

Didn't find your answer?