I have 2 sites, lets call them Site A and B, and on site A which has 2 ISP's we have 2 pix 506E, and on site B 1 PIX 501.
I am trying to accomplish some kind of failover, and my question is: Would it be possible to create 2 VPN tunnels from B to A using two separate ISP links, but both converging to the same protected site A LAN?
In article , Julian Dragut wrote: :I have 2 sites, lets call them Site A and B, and on site A which has 2 ISP's :we have 2 pix 506E, and on site B 1 PIX 501.
:I am trying to accomplish some kind of failover, and my question is: Would :it be possible to create 2 VPN tunnels from B to A using two separate ISP :links, but both converging to the same protected site A LAN?
Please see the following recent thread in which I mentioned a number of the issues involved:
It can be done that way or, if you own you own address space, you can advertise this out over both ISP's (being very careful not to become a peering point), which should also provider the resilience you are after.
"It can be done that way" hides a multitude of practical problems involving detecting failure, detecting resumption of normal operation, and getting LAN 'A' to send data to the currently active tunnel.
I pointed out a number of the potential pitfalls in a dual-equipment configuration in the thread I referenced in my earlier posting,
If you advertise the IP space through two different IPs, then you can have the two ISPs converge at a device outside the PIX A, and PIX A's one single outside IP configured as -the- peer IP on PIX B [but now with two ISP paths to reach that IP] -- but if you do this then you have a single-point failure on the link between PIX A and the WAN device that the two ISPs converge at.
If you advertise the IP space through two different IPs and you have the two ISPs converge on different interfaces of the same PIX A, then PIX A becomes a single point of failure -- but more of a problem is that (at least in PIX 6.x) you cannot failover to a different interface [except -maybe- with some fancy OSPF setup.]
If you have two different PIXes at A, then you run into the difficulties I outlined in that aformentioned thread.
Your reply was so short and nonspecific ("It can be done that way") as to give the impression that this is all easy to configure, but at least in PIX 6.x, it takes a lot of thinking and setup to get a -reliable- automatic failover.
And the PIX can actively partake in which routing protocols? And can advertise what kind of routes with different costs? And can convert between which protocols and which others? Can do what kind of end-to-end testing?
Uh huh. You are expecting someone who obviously doesn't know a lot about redundancy and resiliancy considerations, to jump into HSRP and failover PIX configurations. And you are expecting the failover PIXes to be able to detect the case where a router's external routing goes down but the router continues to send a trickle of traffic (such as for those routing protocols you mention). If the PIX gets traffic on an interface, then as far as the failover mechanism is concerned, the interface is up, even if you can't get anything through to further on.
IMHO, "You could do it that way" without further qualification of the problems in doing it "that way", is tantamount to declaring it routine or relatively simple to do it "that way".
There is, to my mind, a noticable "mood" difference between "You could do it that way" (i.e., "the detail you have given is sufficient for me to judge that your proposed method is workable in your situation") vs e.g., "There are ways that could be made to work" (i.e., "there are some complexities involved that make that unworkable or infeasible in a number of situations, and you haven't given us enough information for us to judge your particular situation, but -generally speaking- if your situation happens to have certain characteristics, and you gather all the right pieces and hook the pieces up the right way, then that approach is one of the options.")
"Would it be possible?" Yes, certainly. Unasked but equally relevant is the question which Walter assumed: "What does it take to do it so it works to improve availability?"
As Walter points out, there is what you might think works based on the marketing materials, what should work based on the documentation, and what actually works in the real work using the shipping HW and SW. As you can imagine, as you work your way closer to reality, your choices tend to get much more limited. I would highly recommend that you take with a huge grain of salt any advice from those who have not done it in a production environment.
Building a redundant network is actually very easy. The challenge is building it so that the redundancy actually results in finite improvements to availability and robustness. Unfortunately, it is altogether too easy to design a redundant network where you increase the probability of failure, the opposite of the desired impact. Proper design requires careful attention to detail, thorough understanding of the underlying protocols and their limitations, and a healthy dose of paranoia.
Success requires a design that can reliably detect failures, take appropriate action to work around the failures detected, and have a fighting chance that the fallback path selected will be functional when needed. Your network will only be as strong as the weakest of the three.
Testing for proper operation and failover can also be challenging. Many failure modes are not appropriately emulated by simply "pulling the plug" on a cable or box, although that can make a good demo to impress management. Don't forget the management and operational requirements to keep a high availability network functioning with high availability over the long term.
The PIX does not allow bridging of interfaces, nor cloning of information to multiple interfaces, nor automatic switch-over to a different interface within the same PIX. Therefore although you might use HSRP to handle 2 routers, you can only connect the PIX to one of the two routers if you connect directly. If you connect the PIX to a switch and connect the switch to the two routers, then the link between the PIX and the switch becomes the single-point for that PIX.
I've read that it is possible to get specially wired cables such that you can literally plug one device NIC into more than one switch, but I've never seen any such cables advertised, and never seen any switch variety advertised that would be able to handle the appropriate "only one switch active at a time" failover logic.
So then for resiliancy you are relying upon the PIX failover mechanisms to detect that the connection to the switch is a problem. The PIX failover mechanisms are able to look at traffic on an interface, not just whether there are link pulses on the interface, but conversely, -any- traffic on the interface will be taken as evidence that the link is working.
Is relying on interface traffic good enough? In my experience, NO: we have a core switch here that wedges from time to time, and when it does wedge, VLANs stop working -- but the switch keeps sending BPDUs, and anything that happens to be in VLAN 1 anyhow keeps getting switched, so some flows keep working and some don't...
If you are going to use failover PIX pairs, then if both plug in to the same LAN switch, the LAN switch is single-point; if you do not plug both into the same LAN switch, you need redundant links and reliable spanning trees... and you have to watch out for network partitions.
Does connecting to a switch and the switch to multiple routers increase reliability? Not necessarily: each device added in series increases the overall failure rate. The usual solution to that is to add devices in parallel: that's easy to calculate for simple matters such as "does enough electricity get through or not?", but not so easy at the communications level.
Did the physical link go down? Was the device interface able to detect that physical failure? (e.g., No because the fibreethernet media convertor is still up) Can the device discover the layer 1 failure by using layer 2 tests, e.g. by looking at packet counts? Can the device trigger some kind of useful layer 1 or layer 2 action when the layer 2 test detects the layer 1 failure, such as sending a link-failure notification to its failover mate, or such as taking down (layer 1) an interface until the other interface comes back up (which might just effectively signal to a different device, or might result in a topology change) ?
Is the physical link up but the remote device is non-responsive? Can the device discover the layer 2 failure by using layer 2 tests? Can it trigger a useful layer 1 or layer 2 action in response? Can the device discover the layer 2 failure by using a layer 3 test? Can it trigger a userful layer 1 or layer 2 action in response?
Can the device react to the layer 1, layer 2, or layer 3 test failures by changing the routing protocols? by sending an snmp trap? a trap to multiple destinations? can it repeat the trap until acknowledged in case the trap got lost? Does the path to the management device happen to be through the link that broke? Does the path to the management device happen to be through a link that the device is configured to pull down?
Can the device sometimes falsely detect a layer 1 or layer 2 failure, e.g. because the remote device is busy and throws away probes it thinks are low priority? because the local device is busy? because the response got discarded on a busy link due to too many collisions? because the response got discarded on a busy link due to buffers filling up? because the response got discarded on a busy link due to traffic shaping or traffic policing?
Does the broken link happen to be the one used to communicate status to the failover mate? If both members of the failover pair know that their tied WAN link are up, but the communications between the failover pair goes down, then will both members of the pair transition to think they are the "active" member and that the other is down? Will the downstream devices be able to decide between them should that happen? If the two devices communicate via a dedicated failover connection, then that connection can fail (or accidently get unplugged, or need to be unplugged to untangle cords, or need to get unplugged as part of testing imperfectly- documented wiring harnesses).
Etcetera. Lots of variables, conditional probabilities, no simple answers for the situation where both devices link they should have control...
And that's not even taking into account timeouts, race conditions, flapping, unidirectional link failures, packet storms, ability of the failover mates to take over gracefully without requiring connections to be reset...
Okay to spell it out: internal routers at each site running a routing protocol (OSPF, EIGRP, RIP, you choose) over point-to-point connections provided by the PIX tunnels to each other. This supplies the failover & can include other benefits such as load balancing, if an appropriate routing protocol is chosen. The PIX's provinde the paths between the internal routers across the internet & no more. The external routing environment provides resilience for the internet access only. The whole scenario, although potentially requiring more hardware is simple to mange, resilient & scaleable; each device providing the specific function it was designed for - rather than having a small number of devices with a complex configuration which would be more difficult to manage in the long term.
I really did not want the discussion to go the way it went, but I learned stuff...thank you both for your time, and knowledge sharing.
To clarify a bit, here's the situation -
This is pretty much the closes setup I was able to find, the only difference to my setup is the fact that each ISP's router terminates in a separate PIX Firewall in site A, specifically pix1 and pix2, and they al have inside interfaces in the same subnet. Basically, I am not trying anything savant here, just simple redundancy, as tunnels are being created by interesting traffic, the inside DHCP server is going to offer 2 default gateways (PIX A's internal if, and PIX B's internal if) and they will be used based on their priority. If one pix inside address' is going to be used as gateway, I would expect the tunnel to be created to the satellite site using that ISP, and if for some reason that ISP fails, the second gateway would be used, and the second PIX would create the tunnel?
Theoretically sound pretty simple and straight forward, but I would like your thoughts and expertise as well......
See the white paper on Redundant VPNs on my web site for an example configuration using a two PIX at each end and BGP to avoid the MTU reduction inherent in GRE tunnels. If you only have a single PIX (or a failover pair) at either site, you'll need to use GRE tunnels, in which case you can use any of the example configs on
It was not clear from your original post that you had real routers inside the PIXes at each end to handle the job.
Side note: Your approach of treating the PIX to PIX VPNs as point-to-point links is the correct way to go. You just need to watch out for the shrinking path MTU and the requirement on each PIX that the destination IP be unique for each VPN path (hence the need for GRE tunnels if using a single PIX).