Aging implementation

S

syuga2012 17 years ago

Hi Folks,

I am trying to understand how arp table entries (just used ARP as an example case) are aged out from an implementation point of view. For e.g how the timers are implemented to do the job.

Appreciate your help.

Thanks, syuga

Vote

B

bod43 17 years ago

If I was interested in this I would dig out some linux or similar source.

Vote

T

Thrill5 17 years ago

In IOS, there is a an ARP aging process that removes expired entries. I don't know specifically how this is implemented, but generically, IOS implements a scheduler, and processes are scheduled to run at specific intervals.

Vote

J

John Agosta 17 years ago

ARP (and other) timers are usually adjustable by the admin with an arp timeout or similar command. I think the default is 5 minutes.

Vote

B

bod43 17 years ago

.

By the way - On a cisco router I think the default is 14,400 seconds (4 hours) - from memory:)

In other words almost forever.

Vote

T

Thrill5 17 years ago

-By the way -

-On a cisco router I think the default is 14,400 seconds

-(4 hours) - from memory:)

-

-In other words almost forever.

Yes the default is 4 hours, but if you are running HSRP on a pair of switches, you need to change the default ARP timeout to 300 seconds to match the mac-address-table timeout (which is 300 seconds or 5 minutes) on each VLAN you are running HSRP. If switch A is the HSRP router, but switch B is the router used for the return traffic, switch B can flood the traffic out the VLAN as an unknown unicast. This can happen because the mac-address-table is only updated when traffic is RECEIVED. Since switch A is the default router (because it is the active HSRP router), switch B will not see any traffic from the client, and the mac-address-table entry will age out. By setting the ARP timeout to same value as the mac-address-table timeout, you will force switch B to ARP the client every 300 seconds. When switch B receives the ARP reply, it will update the ARP entry AND the mac-address-table, ensuring that the mac-address is known and keeping everything working properly. I would consider this a bug, with the bug fix as setting the ARP timeout to match the mac-address-table timeout whenever HSRP is enabled on a layer 3 VLAN. Cisco does not officially consider this bug. If you running HSRP and see lots of unknown unicast traffic on a VLAN, you are seeing this issue.

Vote

P

Phil Harrison 17 years ago

Going somewhat OT, but..

This only happens for traffic that is assymetrically routed on outgoing and return path, typically if active HSRP instances are load-balanced across the switches, which is quite common.

However it is not at all a bug, simply how any switch functions with such designs, and well documented at (Case Study #8):

formatting link

The

better solution is actually to increase CAM aging to match ARP timeout rather than reducing ARP timeout, as the latter incurrs more time needed processing ARP on network and endpoint CPUs.

-Phil

Vote

T

Thrill5 17 years ago

formatting link

The" well documented" part is why I consider this a bug. It is a known issue and should be addressed. A warning printed when enabling HSRP on a VLAN would also suffice as is done when you enable port-fast.

ARP processing so low that it does make any noticable dent in CPU time even when set to 5 minutes on a very busy switch. Depending on who you talk to at Cisco, some recommend reducing the ARP timeout, and others recommend increasing the mac aging timer. I personally am in the reducing the ARP timeout camp.

Vote

B

bod43 17 years ago

..

I used to design large LANs (10+ years ago) and this behaviour was not understood then. I was *horified* when I saw it documented. We had been building networks suceptible to this assymetric flow behaviour for years and no one had noticed:-) :-((

My preferred approach now would be to design the behaviour out completely by choosing the L2/L3 topology so it could not happen.

One problem with the CAM ageing time method is that in the event of a STP Topology Change the aging time is set to a low value temporarily. Oh ... this applies to the arp one too.

Vote

T

Thrill5 17 years ago

-I used to design large LANs (10+ years ago) and this behaviour was

-not understood then. I was *horified* when I saw it documented. We

-had been building networks suceptible to this assymetric flow

-behaviour for years and no one had noticed:-) :-((

Ditto!!!

-My preferred approach now would be to design the behaviour out

-completely by choosing the L2/L3 topology so it could not happen.

I've designed many large LAN's and data centers, and my standard solution is to set the arp timeout to 300 on each VLAN interface. I've been doing this for at least 5 years and have not seen any ill effects.

-One problem with the CAM ageing time method is that

-in the event of a STP Topology Change the aging time

-is set to a low value temporarily. Oh ... this applies to

-the arp one too.

Vote

J

Jens Haase 17 years ago

I have seen massive problems with this setting on a pair of distribution routers (Cat 6500 SUP2 MSFC2). They had around 4k phones and 5k workstations connected and the MSFC was not able to keep up with processing all the arp replies. This resulted in incomplete arp entries and ultimativly in packet loss.

I personally of the opinion that if you follow current best practices for network design you do not need to align the mac aging and arp timer.

Jens

Vote

B

bod43 17 years ago

Me too. Design the issue right out of your network.

Surprising that 30 ARPs per sec was too much for msfc though. Of course msfc was a software router, so maybe it was busy with other things.

Vote

J

Jens Haase 17 years ago

To be precise it is 60 ARPs per sec because the Cisco implementation sends an unicast arp request to the end device after reaching 50% of the configured timer.

But you are right there was other traffic that had an impact on the MSFC. The key point however was that the problems vanished after configuring the Cisco default ARP timer of 4 hours.

Jens

Vote

B

bod43 17 years ago

Ah, that's interesting, never noticed that - but never needed to;)

Thanks.

Vote

T

Thrill5 17 years ago

Where did you find out this information, as that would be a non standard implementation and extremely unusual. What happens if when after only 50% of timer has expired and there is no reply to the ARP? How often does it then keep retrying? Are you sure your not thinking of DHCP?

Vote

A

Aaron Leonard 17 years ago

~ >> Surprising that 30 ARPs per sec was too much for msfc ~ >> though. Of course msfc was a software router, so maybe it was busy ~ >> with other things. ~ >>

~ > To be precise it is 60 ARPs per sec because the Cisco implementation sends ~ > an unicast arp request to the end device after reaching 50% of the ~ > configured timer. ~ >

~ > But you are right there was other traffic that had an impact on the MSFC. ~ > The key point however was that the problems vanished after configuring the ~ > Cisco default ARP timer of 4 hours. ~ >

~ > Jens ~ ~ Where did you find out this information, as that would be a non standard ~ implementation and extremely unusual.

Tis true, tho. Cuts down on the broadcast volume

~ What happens if when after only 50% ~ of timer has expired and there is no reply to the ARP?

The ARP entry remains valid nonetheless. Then when the entry times out, it is removed from the cache, and the IOS device will need to send out a new broadcast ARP the next time it needs to send a unicast IP packet to this destination.

~ How often does it then keep retrying?

You can see how it works (with a 60-second ARP timeout) below.

Cheers,

Aaron

Vote

T

Thrill5 17 years ago

In this example with the ARP time out set to 60 seconds, it re-ARPed in 60 seconds, not in the 30 seconds (or half the ARP timeout) as specified by the other poster. This is how I thought it worked until the other poster said that it re-ARP in half the ARP timeout, but your debug shows that it only re-ARPs after the ARP timeout has expired.

Vote

S

Sam Wilson 17 years ago

The use of the term "standard" for any implementation of ARP seems to be pretty suspect. Even though RFCs 826 and 1122 are both Internet Standards, they're neither complete nor prescriptive about the right way to do ARP. In fact it's amazing that ARP works as well as it does. :-)

Sam

Vote

A

Aaron Leonard 17 years ago

On Thu, 19 Feb 2009 21:35:27 -0500, "Thrill5" wrote:

~ ~ "Aaron Leonard" wrote in message ~ news: snipped-for-privacy@4ax.com... ~ >

~ > ~ >> Surprising that 30 ARPs per sec was too much for msfc ~ > ~ >> though. Of course msfc was a software router, so maybe it was busy ~ > ~ >> with other things. ~ > ~ >>

~ > ~ > To be precise it is 60 ARPs per sec because the Cisco implementation ~ > sends ~ > ~ > an unicast arp request to the end device after reaching 50% of the ~ > ~ > configured timer. ~ > ~ >

~ > ~ > But you are right there was other traffic that had an impact on the ~ > MSFC. ~ > ~ > The key point however was that the problems vanished after configuring ~ > the ~ > ~ > Cisco default ARP timer of 4 hours. ~ > ~ >

~ > ~ > Jens ~ > ~ ~ > ~ Where did you find out this information, as that would be a non standard ~ > ~ implementation and extremely unusual. ~ >

~ > Tis true, tho. Cuts down on the broadcast volume ~ >

~ > ~ What happens if when after only 50% ~ > ~ of timer has expired and there is no reply to the ARP? ~ >

~ > The ARP entry remains valid nonetheless. Then when the entry times out, ~ > it ~ > is removed from the cache, and the IOS device will need to send out a new ~ > broadcast ARP the next time it needs to send a unicast IP packet to this ~ > destination. ~ >

~ > ~ How often does it then keep retrying? ~ >

~ > You can see how it works (with a 60-second ARP timeout) below. ~ >

~ > Cheers, ~ >

~ > Aaron ~ >

~ > --- ~ >

~ > tucson-3640(config-if)#arp timeout 60 ~ > tucson-3640(config-if)#end ~ > tucson-3640# ~ > Feb 19 13:28:29.803 -0700: %SYS-5-CONFIG_I: Configured from console by ~ > console ~ > tucson-3640# ~ > tucson-3640#ping 10.95.42.134 ~ >

~ > Type escape sequence to abort. ~ > Sending 5, 100-byte ICMP Echos to 10.95.42.134, timeout is 2 seconds: ~ >

~ > Feb 19 13:28:57.751 -0700: IP ARP: creating incomplete entry for IP ~ > address: 10.95.42.134 interface FastEthernet0/0 ~ > Feb 19 13:28:57.751 -0700: IP ARP: sent req src 10.95.42.135 ~ > 0001.9696.a240, ~ > dst 10.95.42.134 0000.0000.0000 FastEthernet0/0 ~ > Feb 19 13:28:57.755 -0700: IP ARP: rcvd rep src 10.95.42.134 ~ > 0010.7b11.9a41, dst ~ > 10.95.42.135 FastEthernet0/0.!!!! ~ > Success rate is 80 percent (4/5), round-trip min/avg/max = 1/2/4 ms ~ > tucson-3640#show arp ~ > Protocol Address Age (min) Hardware Addr Type Interface ~ > Internet 10.95.42.135 - 0001.9696.a240 ARPA ~ > FastEthernet0/0 ~ > Internet 10.95.42.134 0 0010.7b11.9a41 ARPA ~ > FastEthernet0/0 ~ > Internet 10.95.42.129 0 0014.6a3e.a9d9 ARPA ~ > FastEthernet0/0 ~ > tucson-3640# ~ > Feb 19 13:29:42.063 -0700: IP ARP: sent req src 10.95.42.135 ~ > 0001.9696.a240, ~ > dst 10.95.42.134 0010.7b11.9a41 FastEthernet0/0 ~ > tucson-3640#show arp ~ > Protocol Address Age (min) Hardware Addr Type Interface ~ > Internet 10.95.42.135 - 0001.9696.a240 ARPA ~ > FastEthernet0/0 ~ > Internet 10.95.42.134 1 0010.7b11.9a41 ARPA ~ > FastEthernet0/0 ~ > Internet 10.95.42.129 0 0014.6a3e.a9d9 ARPA ~ > FastEthernet0/0 ~ > tucson-3640# ~ > Feb 19 13:30:42.064 -0700: IP ARP: sent req src 10.95.42.135 ~ > 0001.9696.a240, ~ > dst 10.95.42.134 0010.7b11.9a41 FastEthernet0/0 ~ > tucson-3640#show arp ~ > Protocol Address Age (min) Hardware Addr Type Interface ~ > Internet 10.95.42.135 - 0001.9696.a240 ARPA ~ > FastEthernet0/0 ~ > Internet 10.95.42.129 0 0014.6a3e.a9d9 ARPA ~ > FastEthernet0/0 ~ ~ In this example with the ARP time out set to 60 seconds, it re-ARPed in 60 ~ seconds, not in the 30 seconds (or half the ARP timeout) as specified by the ~ other poster. This is how I thought it worked until the other poster said ~ that it re-ARP in half the ARP timeout, but your debug shows that it only ~ re-ARPs after the ARP timeout has expired.

I wanted to see if anyone was paying attention :-)

Actually, the first unicast ARP is sent ~ 44 sec (13:29:42.063 - 13:28:57.755) after the ARP entry is installed, or when the ARP entry has about 25% of its lifetime remaining. Then another unicast is sent 60 seconds later (at

13:30:42.06), which is when the entry is removed. I.e. even though the ARP duration is configured as 60 seconds, the entry actually lives for about a minute 45.

This "directed ARP" behavior is quite undocumented ... from what I can find out, the idea is that the IOS device will send out a unicast "directed" ARP 60 seconds before the ARP entry expires, then if no response, another when the entry expires.

Vote

Aging implementation

Join the Discussion

Didn't find your answer?