3-site VPN implementation w/Terminal Server

The persistent re-negotiation between the 2 problem sites seems to take place when either phase1 or phase 2 has expired (I haven't been able to accurately determine determine the timing, since the expiration happens when I'm at work and I can't make a VPN like my client's network).

I'll try removing the scheduled connections (for IPSec) and the dead peer connections. I did delete and re-create the tunnels once after I created all new IKE profiles, but I'll try it again.

Any advice on timing/duration settings of phase1/2 configs? Should phase1 be set to expire after 24hrs and phase 2 set for something like

8 or 12 hrs?
Reply to
Vince
Loading thread data ...

Dead peer detection is a bit hit or miss. I start with it disabled and then add it in if the connection seems unstable. It only helps if the underlying network has problems though. (ADSL link that goes offline, occasional packetloss, that kind of thing). If you see constant dead peer detected messages in the logs you may try turning it off. If the connection is stable with it disabled then either the dead peer detection settings were wrong or something wasn't responding to keepalive messages as expected.

Scheduled connections do nothing for IPSec. This is for PPP style connections.

Setting the idle to 0 is the correct way to indicate the tunnel should stay "nailed" up at all times regardless of traffic.

So are your tunnels still renegotiating every few seconds? Have you had any luck isolating the problem? The last mention you said that both tunnels from one site were working properly but the connection between two other sites were still not working. Have you deleted the tunnels between those two problem sites and tried creating all new settings? Have you tried calling Netopia to have them look at the problem?

Reply to
Mike Drechsler - SPAM PROTECTE

Well Mike, I thought I was OK, but I'm still having trouble.

I re-created the tunnels between the 2 problem endpoints (Sites A and B), and things seemed to work nicely. Phase 2 re-negotiations took only a handful of attempts. For the past 5 days or so, the tunnels have been stable, with the phase 2's renegotiating successfully as scheduled (every 4 hours.) Then just this morning, I ran into the same problem again with the A-B tunnel, with phase 2 failing repeatedly (endless "Phase 2 complete" messages) for several hours. I rebooted the router at Site B and the tunnels re-established after about 90 seconds. Connections and IP traffic between sites A and B have been fine for the past 3 hours; hopefully the next phase 2 re-negotiation won't barf.

I'm at my wits end with this. The tunnels out of Site C have been rock-solid since inception. The A-B tunnel settings at Sites A and B are identical (and different from the A-C and B-C settings). I have done a 'show config' dump and checked everything line by line. Furthermore, the IKE and Connection Profile settings for the A-B tunnel match the A-C and B-C settings (though unique from the other 2 tunnels in name, IKE Profile, and password).

Netopia online chat help would not offer any VPN configuration assistance; they referred me to their fee-based production support offerings (consistent with their website's advertised support policy regarding VPN's).

The only common issue I can think of at this point is that Sites A and B both have an ISP connection requiring PPPOE underlying encapsulation even though they have fixed IP addresses. Site C (the oldest) for some reason, even though under the same provider (SBC), does not utilize PPPOE at all.

Any thoughts?

Reply to
Vince

PPPoE doesn't exist around here. Every provider where I live is either DHCP or manual hardcoded IP. If there is a problem with the PPPoE side of things I would have never seen it because of this.

You could try playing around with any available MTU settings if PPPoE is involved.

Though it doesn't seem likely that there is a network problem if these sites can communicate with the other router without problems but you should check the network between the two sites. Do ping test with large packet sizes and the do not fragment bit set. Do these tests while transfering increasing amounts of data back and forth and see how it behaves.

If these 2 sites do not communicate with each other frequently or require high bandwidth you could route all traffic through your "site C" location.

You could consider paying Netopia for their VPN setup service and if they find a bug in the router firmware you get a refund. Ask them if they will refund the money if they fail to get a reliable connection. It's not like they charge an excessive amount for the service. (Less than a typical consultant visit)

On the extreme end of things you could configure a test network. There are ways of using Linux to create your very own pppoe server and make a test to determine if it's the routers or the network causing the problem.

Reply to
Mike Drechsler - SPAM PROTECTE

I think I will be calling Netopia this week or next. This is maddening. Sometimes the Phase 2 IPSec renegotiation (at time of expiration) takes 2 minutes, sometimes it takes 2 hours, sometimes it never succeeds until I bounce the routers. This problem is now occurring in all 3 locations. I am going to do a factory reset on all

3 routers this week and try re-building the setups from scratch, but at this point I have my doubts that even that will work.

Even after scouring the 'net, I was unable to find anyone having a similar problem, so woe is me.

If anyone has seen this type of behavior, and could offer any insight, I would greatly appreciate it. I will post my results from the rebuild and Netopia supposrt assistance as I have them.

Reply to
Vince

The only times I have had that kind of trouble is when there was packetloss on the connection between two sites. A good example is recently I had trouble maintaining VPN links from home. I found that my cable modem would drop large packets. As the size of a packet increased, it's likelihood of it being lost increased in this case. When doing a ping with the default 32 bytes the packetloss was almost

0%. At the maximum ethernet packet size (ping with 1472bytes) the loss was often nearly 90%. The cable company tech came out, we hooked up the modem to the pedestal out on the street and found the same trouble there so he booked a call to have network maintenance come out and fix the amplifiers in the area. A week later and now everything is great again. I get no packetloss with large packet sizes and my VPN connections are solid again.

It's possible you have some lower level problem with your internet connections. Packetloss is an enemy to a stable VPN connection.

Reply to
Mike Drechsler - SPAM PROTECTE

Boy, if that's the case, I would be ecstatic if I can get this resolved. OK, say the scenario you described is applicable in my case. How would I best explain this to my ADSL provider (SBC) to convince them of an actual problem that falls within the scope of their service guarantees? I'm worried that the conversation will end with "Well, you can surf the internet, right? Then it's working as it shoud."

routers for 2-3 minutes w/1427 bytes when the tunnels were working, wasn't losing anything: Reply from 192.168.1.1: bytes=1427 time=217ms TTL=254 Reply from 192.168.1.1: bytes=1427 time=217ms TTL=254 Reply from 192.168.1.1: bytes=1427 time=217ms TTL=254 Reply from 192.168.1.1: bytes=1427 time=217ms TTL=254 Reply from 192.168.1.1: bytes=1427 time=216ms TTL=254 Reply from 192.168.1.1: bytes=1427 time=217ms TTL=254 Reply from 192.168.1.1: bytes=1427 time=218ms TTL=254 Reply from 192.168.1.1: bytes=1427 time=218ms TTL=254 Reply from 192.168.1.1: bytes=1427 time=217ms TTL=254 Reply from 192.168.1.1: bytes=1427 time=217ms TTL=254

Reply from 192.168.2.1: bytes=1427 time=173ms TTL=254 Reply from 192.168.2.1: bytes=1427 time=171ms TTL=254 Reply from 192.168.2.1: bytes=1427 time=171ms TTL=254 Reply from 192.168.2.1: bytes=1427 time=171ms TTL=254 Reply from 192.168.2.1: bytes=1427 time=171ms TTL=254 Reply from 192.168.2.1: bytes=1427 time=171ms TTL=254 Reply from 192.168.2.1: bytes=1427 time=170ms TTL=254 Reply from 192.168.2.1: bytes=1427 time=171ms TTL=254 Reply from 192.168.2.1: bytes=1427 time=172ms TTL=254

Does this mean my problems don't match with what you had?

Reply to
Vince

Well, mine was particularly bad. But it did have it's good moments where I could use the connection and other times when it was horrible and my internet was basically down. I felt fortunate that the tech came when it was very bad. The problem was very evident and it continued for the entire length of the service call so we were able to do the tests inside the house and out on the street and find that it was an unbalanced amplifier (low channels where the return path is transmitted was very low compared to the rest of the channel range)

You can basically just run a continuous ping to the other site you are having problems with as well as a ping to a default gateway or other appropriate router on the ISP side that would show if the problem is local to your connection or more toward the remote end. There are plenty of third party programs for monitoring your network connection. Something like Advanced Hostmonitor

formatting link
is a good start, but probably not perfect for this purpose.

The hardest problems to troubleshoot are intermittent as everyone knows.

As for your test. You should ping a public interface to check for packetloss. The VPN can recover from some level of packetloss and hide this from your ping session. You may see it as higher ping times. Compare the results of the ping to local ISP default gateway, remote site public interface and tunneled connection to remote router private network. The larger packet sizes just seem better at exposing some problems but are not always necessary.

Reply to
Mike Drechsler - SPAM PROTECTE

OK, during a period where the Phase 2's just thrashing each other from both ends (central office and problem remote office), I tried ping testing to a public address (easynews.com), the local DNS Server, and the remote router's WAN IP address, all with no packet loss from either side after about 2-3 minutes of continuous pings at min and max packet sizes.

I've gotten to the point where I have pushed the Phase 1 timeout to

24hrs and the Phase 2 timeout to 12hrs, just to keep my clients from major downtime. That only worked for a day and a half. Yet again, the remote office had to hard-boot the router and wait 5 minutes before the tunnel was restored.

I have come across 2 spare Linksys BEFVP41's which I am about to try, just for comparison. If they appear to work better, I will most likely junk the Netopias. If the problems remain the same, I may just nuke the site for morbid. This has been unbelievably frustrating.

Reply to
Vince

Well good luck. I have had my share of phase 2 problems, but they always seemed to be something with the settings or the network link.

Reply to
Mike Drechsler - SPAM PROTECTE

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.