High EIGRP Pending routes

Most of the documentation I can find on "Pending routes" from show ip eigrp interfaces comes directly from Cisco and is not that helpful. Their definition is "Number of routes in the packets sitting in the transmit queue waiting to be sent." The reason for this question is that we have an EIGRP meltdown on our core routers (6500s - Sup720) about every three months. We created scripts to collect more data from the router on a 5 minute interval and the early indication that EIGRP was in trouble was that there were

52 out of approximately 300 EIGRP interfaces that had as many as 8000 pending routes. The remainder of the interfaces had zero pending routes. The IP routing table is approximately 4000 routes. The CPU utilization was 34% at 5 sec with 11% consumed by EIGRP PDM. Five minutes later the CPU was at 100% and Pending routes were as high as 100,000 on some of the EIGRP interfaces. All interfaces with EIGRP neighbors had tens of thousands of pending routes. This core router (Core_01) has another core router attached (Core_02) and it did not record a high number of Pending routes. There was no indication in the syslog of any significant topology change leading up to or during this event. So my questions to the group are;
  1. Does anyone have a better definition of Pending routes?
  2. How often are the counters from show ip eigrp interface updated by the IOS?
  3. Does 8000 Pending routes seem to high considering that there are only 4000 routes in the IP routing table?

Thanks in advance

Reply to
sonic31ss
Loading thread data ...

Caveat - I am not an EIGRP expert. In fact I don't know much about it at all and have basically zero operational experience.

Answers:-

  1. I understand that EIGRP requires that routing updates are acknowledged by the neighbour. Most acknowledged protocols send some information then wait on that being acknowledged before sending more. I guess that pending routes are routes waiting on previously sent routes being acknowledged. This could be caused by communications problems with the neighbours or CPU overoload on the neighbours.

  1. I don't have a clue about these specific counters. The bytes in/out etc counters are updated quite infrequently on some platforms - 20 seconds is the longest I have seen. From memory it is either

15sec or 20sec on the 6500.

  1. Yes. EIGRP sends *routes*. Perhaps the router has decided to send one set of updates and then before they can be sent successfully something triggers another update. It may be no co-incidence that

8000 =3D 2 * 4000.

eigrp sends every route out of every interface (except for split horizon) so you expect to see (nearly) all routes.

I would approach this as follows:- Check out if there is something in common with the 52 interfaces that get the queues first.

Then I would get back to basics.

Check the whole network for Interface errors on infrastructure ports. Zero is good.

sh ip eigrp nei shows srtt and I think also counts missed hellos. worth a look.

I am not really sure of the significance of this but I woud fancy checkig that all routers with more than one link to any destination have a feasible successor to that destination or are load sharing.

Remember that with dynamic routing protocols the key thing is the performance of your feeblest router. It has to be able to deal with all of the requests made of it.

The other thing that might go wrong is that on a slow link updates might not be completed before another is required. Have you any slow links? Work out how much data is required to send the 4000 route update and figure out how long it takes to send it on your slowest link.

Perhaps you could describe the network further. How many routers are in the EIGRP AS. How many EIGRP processer are there? Are you doing summarisation? Do you manage all of the routers? Is the network "well designed" or are new devices and links whacked in as required? (you dont need to answer that:)

300 EIGRP ports sounds like quite a lot to me.

I'll stop there:)

Reply to
bod43

it sounds like you have some sort of EIGRP update storm going on.

EIGRP is more flexible than some other state based protocols (whether you like DUAL is a separate discussion).

The router with all the pending updates is flooding changes to lots of other routers, and some of them cannot keep up.

Once you get in this state you seem to get "waves" of updates bouncing around the network.

The fix is to reduce the ways that updates propagate and multiply around your network - this usually occurs where you have a lot of loops, say with a dual centred star type topology.

You can set limits to how on which paths are candidates for updates from the core to edge locations and back that cut down on the paths for routing updates to propagate.

There is some EIGRP best practice in here (read the whole thing, but p16 on has ways to control scaling effects in EIGRP):

formatting link
this is design far campus, but the routing topology issues are the same in a WAN just made even worse by the latencies and lower bandwidths.

EIGRP lets you build arbitary topologies (much more so than the rigid area structures you need with OSPF and IS-IS).

That is both a blessing when it allows exceptions to a hierarchical design, and a curse when that gets out of control.

a bit about troubleshooting

formatting link
Once you have some more info - have a hunt around the cisco site - most of what you want should be there, but they are good at hiding the woods behinds the trees......

Use stubs and route summarisation around your topology and you should see some improvement.

Also, this should improve convergence.

some good stuff for Cat6k

formatting link

Reply to
Stephen

What is your topology that you have 300 interfaces running EIGRP? That seems very excessive!!! If you have two 6500's for redundancy, each with same 300 VLANs on them, running EIGRP on all of the interfaces is a very bad practice. You should only have 1 or 2 neighbor relationship between the same two routers. In a configuration like that on a 6500, the best-practice is enable "passive-interface default" under the EIGRP process, and then enable EIGRP on a one or two of the VLAN's using "no passive-interface vlan 100", "no passive-interface vlan 101", etc. Each time a route goes away, it sends a query to each of its neighbors, and then has to wait for a response from each one. If you have 300 neighbor connections to the same router, it will send a 300 queries to it, and wait for 300 responses. Since all the queries are going to th same router, it's going to get back the same answer 300 times.

I'm sure your "melt-downs" are due to SIA's (stuck in actives) which is directly related to the above issue.

Reply to
Thrill5

Hi Thrill5,

No, the 300 interfaces are mostly GRE tunnels to field routers. And the SIAs only begin after the CPU reaches 100%. And we all know that EIGRP does not behave well under CPU load.

Thanks, Jeff

Reply to
sonic31ss

FWIW if you manage to have a partially stable EIGRP with 300 adjacencies, you have proved in "conditionally" stable under high load......

GRE tunnels can be a pain with any routing protocol, since typically the CPU has to maintain state for each tunnel as well as EIGRP, and when key interfaces get hit with congestion.

You also need to see if there can be contention for GRE tunnel bandwidth - since EIGRP will eat up to 50% of bandwidth on an interface, if lots of GRE tunnels contend with each other, you may be generating your own congestion storm.......

Thrill5 is going the right way - what you need to do is

  1. reduce the number of adjacenies if possible,
  2. cut down the info that flows across them,
  3. cut down the number of peers that get probed for alternate routes when the topology changes.

i would add:

  1. control the EIGRP traffic - reduce EIGRP allowed bandwidth, and maybe tweak the timers.

You might want to grab one of the cisco books like "large scale IP network solutions" - this has a good chapter on EIGRP for big hub routers.

Reply to
Stephen

If all of the field offices need to go through this central site then an easy way to fix this is to send only a summary-default to the remotes. If all the remotes need to go through this site, then they don't need a full routing table, only a default route. By doing this, you will solve the pending routes problem because you will be sending only one route instead of

4000. It will also solve the "melt-downs" because the summary default route will also prevent the router from sending a query when a route goes away. (If the route that goes away is within the summary route on the interface, then a query is not sent.) Basically what is happening is that you have 300 adjacencies, and every time a route changes, every router is queried. The router must wait for a response from each router. This will drive up the CPU, causing received queries to be lost. If that happens, the router will then reset the neighbor and need to send the full routing table to that router, again driving up the CPU, causing more replies to be lost. You then have what is commonly referred to as a "cascading failure" and all hell breaks loose.

To enable the summaries, on the interface to the remote router add the commad: ip summary-address eigrp 0.0.0.0 0.0.0.0 255

For a default summary you must set the admin-distance to 255 or weird things will happen if your default route goes away for some reason. For summaries other than a default you should set the admin-distance to 5.

Another thing you could do is change the remotes to a "stub". You can only do this if the remote is not a transit point for other routing destinations.

Reply to
Thrill5

Hi Thrill5,

All good points. I should have pointed out that we are already using ip summary-address eigrp on most of the EIGRP interfaces. I have to admit that we did not have the summaries applied evenly and there were a few neighbors at the time that were receiving the full routing table that did not need it.

Best regards, Jeff

Reply to
sonic31ss

Hi Stephen,

I should have pointed out that none of the EIGRP interfaces were under load before or during this event.

Also, going back to your earlier point about an EIGRP update storm. Why would there be an update storm if there were no changes recorded in syslog or on our management station in the AS?

I appreciate the recommendation for the book. I have several that cover EIGRP, including a Cisco QOS book that has a chapter on EIGRP. It had a slightly better definition of Pending Routes;

"The number of routes that are affected by the queries or updates that are in the queue is displayed as Pending Routes."

Best regards, Jeff Best regards, Jeff

Reply to
sonic31ss

Hey Jeff,

What version of code are you running on the Cats? I ran into an odd issue with IOS withdrawing entries from the forwarding table, but the EIGRP route table seemed OK. My issue stemmed from a default network advertisement from my perimeter being injected into EIGRP. The Cats then peered with our WAN aggregation devices (7206VXRs). The 7200 would flag all internal routes as exterior routes (sh ip eigrp top "network prefix") instead of just the default network. This is a known bug and was resolved with an IOS update.

The symptoms I saw was that whenever new routes were added or deleted (turning up a new location, prefix list tuning, new route summarization advertisements, etc.), I'd lose connectivity to certain or all locations on the WAN. The route tables *looked* OK and there was nothing in the logs that indicated a problem. I was able to recover by soft-resetting the EIGRP process and forcing neighbor adjacencies to be rebuilt. I don't have nearly as many routes as your talking about and my failure scenario was different. Nor do I have the GRE issues to contend with, either. However, I'd be interested to see if your seeing this particular bug. I may be totally off, but thought I'd throw this your way.

I'm on 12.4(11)T2 on the 7206VXRs, and 12.2(18)SXF7 on the Cats.

s>> On Sun, 11 Jan 2009 07:59:05 -0800 (PST), sonic31ss

Reply to
fugettaboutit

i think some events need to be explicitly turned on to be logged so you may not see changes to neighbours.

eigrp log-neighbour-changes ???

Reply to
Stephen

If you have a hub and spoke network, especially one where the spokes are dual homed, then have a look at making all the spokes stubs, as described here

formatting link
it greatly increases scalability of EIGRP.

/Jesper

Reply to
Jesper Skriver

We are running 12.2(18)SXF7 on the cats.

Reply to
sonic31ss

We have eigrp logging neighbor changes.

We are also logging line protocol changes on the 6500, which is not on by default. ;-)

Thanks, Jeff

Reply to
sonic31ss

have a look at this old networkers presentation:

formatting link
the comments about F/Relay and reducing the bandwidth also apply to a VPN system.

finally - the update speed depends on reported link speed below 1.5M all links are "slow WAN", everything else is LAN speed and gets more frequent hellos and so on.

Reply to
Stephen

FTR, I assume this is tunable?

Reply to
alexd

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.