Best HA switch setup?

- A
- aBs0lut30
  
  Contact options for registered users
posted
17 years ago

Wed, Aug 30, 2006 12:58 AM

My company is currently looking at upgrading our network to remove all of our single points of failure. Next on the docket is our switching hardware, None of us have any experience with HA switching so i am just poking around doing a bit of reasearch into it. So I would like to get some of your opinions on what is the best route?

A chassis switch with multiple cpu cards? if so what brand?

Or multiple switches linked? also, what brand?

Also, what are your reasons for your recomendations? or any thing we need to consider?

Thanks in advance guys.

- S
- silkman425
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Sep 3, 2006 5:23 PM

You couldlook at Cisco HSRP.

Using HSRP for Fault-Tolerant IP Routing This case study examines Cisco's Hot Standby Routing Protocol (HSRP), ... For IP hosts that do not support IRDP, Cisco's HSRP provides a way to keep ...

formatting link

- 56k - Cached - Similar pages

Cisco - Hot Standby Router Protocol (HSRP): Frequently Asked Questions Can a Cisco 2500 and Cisco 7500 router on the same LAN segment use HSRP, or do I have to replace one of the routers so the platforms are identical? ...

formatting link

- 17k - Cached - Similar pages [ More results from

formatting link

]

Cisco HSRP: Hot Standby Router Protocol Overview (RFC 2281) Cisco HSRP: Hot Standby Router Protocol RFC 2281, HSRP overview, HSRP header, HSRP format, HSRP structure.

formatting link

- 9k - Cached - Similar pages

Silk

aBs0lut30 wrote:

- A
- anoop
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sun, Sep 3, 2006 7:26 PM

What level of HA do you have in your network today?

Some things to look for when considering HA are:

- Node redudancy

- Link redudancy

- Approximate time to repair when a failure happens; i.e. reroute times and such. This may require you to tweak protocol parameters to achieve your objective.

Most of the time, multiple chassis cards are helpful with things like "in service upgrades" so that you can upgrade the code on the standby CPU and then switch to that in seamless way. However, vendor support for something like this varies. Switches with multiple CPU cards tend to be significantly more expensive than systems with a single CPU card.

When you look at this feature find out what the capabilities are of the setup. Is it hot standby or warm standby? Warm standby is where the standby is booted and ready to take over but the protocol state is not maintained in the standby, so when it does take over it has to reestablish adjacencies with all of its peers. If they claim hot standby find out which of the protocols will fail over seamlessly. They may not support hot standby for all of the protocols.

Find a switch vendor that supports VRRP. Most vendors suport that (HP, Foundry, Extreme, Cisco...), but you will have to find a switch that meets your needs that also has that protocol. VRRP protects the "client" side of things so that clients can always reach a router.

The next area is basically redudancy in the core to ensure that all of the edge switches will always have connectivity through the core of your network. Depending on whether you run bridging or routing, you would have different capability in terms of "time to repair". Depending on the type of fault RSTP may be faster to reconverge than a routing protocol, but there has been recent work in the area of "IP Fast Reroute" so if you have a routed core, you might want to look into support for that as well.

Hope this helps.

Anoop

- S
- stephen
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Sep 4, 2006 8:32 AM

Anoop gives a good checklist, but you need to decide what you mean by "no single point of failure".

1 of my definitions for seriously paranoid network design is that the network survives the "sledgehammer" test - ie. does it still work after someone scraps one of your boxes? (dont forget this applies to the switches, but also to UPS, the local power substation, the fibre terminations, and the big mains transformer outside the building).

even then there are network wide failures - spanning tree is the common one, but continuous routing protocol convergence is pretty difficult to work around.

there are very few boxes that are fully internally resilient - backplanes and power connections inside the box often fail very rarely, but when they do the box is toast.

a much more likely failure is some sort of software problem - and any sort of resilient control processor makes the software control much more complex, and arguably much more likely to fail.

you also need to think about using a routing protocol within a big campus.

agreed. The issue often forgotten about resilience is "how fast" does the network recover

for example IP telephony usually needs the network to recover from a fault in sub second timescales unless you are willing to drop calls. Maybe that is a rare enough event that you dont care - but many designers miss this until it shows up in anger on the live network...