EIGRP and its limits

Hello,

we're using EIGRP as the routingprotocol in our cisconetworks, especially in one large centralized "xDSL with ISDN-backup" branch network. Every branch office (1000+) uses a C83x with EIGRP STUB routing (of course).

As we're changing the design and expanding the network, we got news from Cisco, that EIGRP itself should not be so scalable as we thought and as it was told on the official whitepaper:

*quote* There are no limitations on the number of neighbors that EIGRP can support. The actual number of supported neighbors depends on the capability of the device, such as:
  • memory capacity / * processing power / * amount of exchanged information, such as the number of routes sent / * topology complexity * network stability
*/quote*
formatting link

First we've learned, that i.e. in a DMVPN-concept as a design-limit any router will only accept up to 700 EIGRP-neighbors at maximum, regardless of memory and processing power.

*quote* If a second mGREinterface is set up on the Cisco 7200 Series Router, it can accept a maximum of 350 tunnels per interface (700 total) */quote*
formatting link
It seems, that the second quote stands in hard contrast to the first one. So which one should be considered right and why? Both are official cisco-statements...

Second we've learned, that even when distributed over a farm of a bunch of load-balanced VPN-endpoints (38xx, 28xx routers i.e. ), each one terminating 300-600 "spokes", EIGRP would not be able to handle such a large net (let's say with more than 1000 EIGRP STUB's) as one whole AS.

So, can maybe anyone from this group tell me of some 'real' large cisco-networks with such a large number of EIGRP stubs he or she knows of?

Has anyone ever hit EIGRP limits regarding number of participating routers or neighbors in realitiy? What were the symptoms? Was it a problem of CPU/memory/flapping routes or convergence time?

Does anyone has official proven "hard"-facts about the scaling capability of EIGRP? Until yesterday I thougt EIGRP was *the* routing-protocol "flagship"; highly scalable and performant. But now that no longer seems so real... :-(

Besides all theory and discussion with our account team, we are urgently searching for some networker with "real life" experience in this kind of network design, who is maybe willing to share some experiences. ;)

Thank you very much in advance for any hint or real life experience,

Dennis Breithaupt

Reply to
Dennis Breithaupt
Loading thread data ...

The maximum number of neighbors you can support on a single router will be unique to your network. On our network as we increased the number of neighbors above 600, we started seeing the convergence time increase logarithmically. We are currently around 650, and those extra 50 have increased convergence times by about 50%. I know of no hard limit on the number of neighbors, it just a matter of convergence time.

We have a frame network, with each branch device has two PVC's each to a different central site router. We have approximately 650 neighbors currently, and convergence time is longer than we like (about 5 minutes) when we loose one of the T3's the CPU on the central site routers (7206VXR, with NPE-400's), go to 100% CPU for about 5 or 6 minutes. We have done testing with a 6500 and a FlexWAN for the T3 cards and we did see an improvement in performance. The FlexWAN has its own CPU for packet processing, while EIGRP is calculated on the Sup card. Neither CPU went to

100% for more than 5 seconds, (but CPU on one or the other was over 50% for at least 3 minutes). In your case, since you will have only a "single PVC" and ISDN as your backup, I don't see why you would not be able to have 1000 EIGRP neighbors. Since I can't imagine that you will have ISDN capacity for all 1000 locations, you should not see a large number of neighbors (500+) and routes move from one router to the other in less than three or four minutes. For us, 5 to 6 minutes of downtime is not optimal for the business we are in (large financial institution) and so we are always looking for ways to reduce this number. We have spent a considerable amount of time looking at this problem and for us the solution is to move to MPLS. Using MPLS solves our problem because each router (including the central site) will have only a single neighbor (either EIGRP or BGP depending on the provider). Our other solution is break out each central site into multiple T3, so that each router has a smaller number of neighbors.

EIGRP generates a large number of packets in a very short period of time (a lot of it broadcast) when establishing neighbors, so having the FlexWAN CPU doing the interface up/down processing and the Sup dealing with just EIGRP and routing helped quite a bit.

We had to tune the broadcast queues on the T3 interfaces because of the number of neighbors. On a T3 interface, the default values are good for around 200 neighbors.

In a nutshell, for EIGRP the number of neighbors a router can support is dependent on its CPU. It has to do a lot of processing when all those neighbors are lost at the SAME time and this is only a factor when those routes need to move to a different router. I would buy the fastest router possible for your application. For your application, a 7600 with FlexWAN would be my first choice, and a 7200VXR with an NPE-G1 second.

In terms of the number of devices in a single AS, I don't know where you getting those numbers. We have well over 2000 devices (with well over

15,000 interfaces running EIGRP), and have no scalability issues. The keys to EIGRP scalability are summarizing (edge devices only get a default summary route 0.0.0.0 0.0.0.0), a well planned IP addressing scheme that allows you to summarize as much as possible, and using EIGRP stub or distribute lists on low-end devices (3600 and below). Stuck-in-actives are the bane of EIGRP; stub, distribute lists and summaries stop routes from going active and let EIGRP scale. Having a stable EIGRP network does not come without a time and effort, but we have found it to be easier to implement and maintain than OSPF. We had OSPF and EIGRP running in different parts of the network (don't ask), and converted everything to EIGRP a number of years ago because we found that the EIGRP network was more stable, more resilient and easier to manage. Over the last 3 years or so, all of our critical network outages not due to circuit issues were L2 problems caused by equipment failure or bugs.

Scott

formatting link

Reply to
thrill5

Hello and thank you very much for your detailed answer!

There's much truth in it...

Some short statements and consequences for our new netdesign...:

- Using C72xx as SLB (VPN Server Load Balancing) with a farm out of 28xx routers, each holding max. 375 neighbours, which is Ciscos recommendation for large DMVPN-networks. [1] At least it seems, that there is a limit of 375 neighbors per one mGRE-interface documented in Ciscos document "Large scale DMVPN". I guess, the mentioned limit maybe lies in the nature of the mGRE-interface and the fact, how multicast messages are splitted and encrypted through the different point-to-point tunnels created internally.

- SLB has even some more advantages. So can we use the "slow-start" feature, to send load/new peers slowly to the VPN-endpoints after reload.

We think, a farm of a bunch of "small" VPN-endpoints could be more cost-effective and more failure-tolerant then two big "core"-routers. Besides that, we think because of the earlier referenced limit of 375 neighbors, we _have_ to distribute the "spokes" over more hubs...

- We maybe will adjust eigrp-timers a bit, too. We don't think, that it is necessary for us to send a hello every 5 seconds, but maybe every 10 would be enough. As it takes about 30-40 seconds for ADSL to rebuild the connection and reestablishing the VPN-tunnel that should be sufficient. Furthermore that will reduce pps of the routing overhead reducing the load. (375 Spokes, 2 hellos per direction every 5 seconds = 150 pps...) Of course, we use hub-spoke communication only, route-summarization to the spokes end eigrp stub.

- Furthermore we want to do some kind of traffic-engineering with some sort of LLQ on the physical interfaces involved to priorize routing-protocol traffic, over real-time traffic over the rest. As I understand, the main concern (of our cisco account team, too) is the loss of EIGRP HELLO-packets combined with such a large number of neighbors will cause the neighborships and routes to flap, which we wan't to compensate with that.

Regards,

Dennis

[1]
formatting link
5.

thrill5 wrote: [...]

[...]
Reply to
Dennis Breithaupt

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.