Absurd problem with HSRP

Hello

I have this scenario:

R1 with DSL line R2 with DSL line

Two HSRP Groups in both routers. group 1: virtual IP 172.16.0.1 group 3: virtual IP 172.16.0.3

Each router has two tracking systems, a) tracks ATM0 b) tracks icmp ping to a remote host (next-hop ISP IP)

Actually I have two Cisco 877 routers:

R1:

172.16.0.241/23 IOS 12.4(24)T4 with 128mb DRAM

R3:

172.16.0.243/23 IOS 15.1(4)M2 with 256mb DRAM

Here are the confs:

R1: INT VLAN1: ip address 172.16.0.241 255.255.254.0 no ip redirects no ip proxy-arp ip nat inside standby delay minimum 30 reload 20 standby version 2 standby 1 ip 172.16.0.1 standby 1 timers 5 15 standby 1 preempt standby 1 name R1A+R3P standby 1 track 1 decrement 10 standby 1 track 2 decrement 10

standby 3 ip 172.16.0.3 standby 3 timers 5 15 standby 3 priority 95 standby 3 preempt delay minimum 20 reload 20 sync 10 standby 3 name R1P+R3A standby 3 track 1 decrement 10

R3: INT VLAN1: ip address 172.16.0.243 255.255.254.0 no ip redirects no ip proxy-arp ip nat inside no ip virtual-reassembly in standby delay minimum 30 reload 20 standby version 2 standby 1 ip 172.16.0.1 standby 1 timers 5 15 standby 1 priority 95 standby 1 preempt delay minimum 20 reload 20 sync 10 standby 1 name R1A+R3P standby 1 track 1 decrement 10 standby 3 ip 172.16.0.3 standby 3 timers 5 15 standby 3 preempt standby 3 name R1P+R3A standby 3 track 1 decrement 10 standby 3 track 2 decrement 10

The router works as the same. I also have OSPF on the network.

Suddenly, seldom happens that R3 lost connection to the hsrp partner and drops OSPF but no network outages occurs. I replaced R1 from a Cisco 837, to a 2610XM, then to a 877 but the problem remains.

This happens in the syslog of R3.

000167: Nov 18 16:11:03.962 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak 000168: Nov 18 16:11:20.758 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby 000169: Nov 18 16:17:19.275 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active 000170: Nov 18 16:17:34.482 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1 on Vlan1 from FULL to DOWN, Neighbor Down: Dead timer expired 000171: Nov 18 16:17:42.232 cet: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.0.241 on Vlan1 from 2WAY to DOWN, Neighbor Down: Dead timer expired 000172: Nov 18 16:18:12.019 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1 on Vlan1 from LOADING to FULL, Loading Done 000173: Nov 18 16:18:19.088 cet: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.0.241 on Vlan1 from LOADING to FULL, Loading Done 000174: Nov 18 16:31:43.488 cet: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.0.241 on Vlan1 from FULL to DOWN, Neighbor Down: Dead timer expired 000175: Nov 18 16:31:47.073 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1 on Vlan1 from FULL to DOWN, Neighbor Down: Dead timer expired 000176: Nov 18 16:33:37.094 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1 on Vlan1 from LOADING to FULL, Loading Done 000177: Nov 18 16:33:42.651 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak 000178: Nov 18 16:33:50.989 cet: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.0.241 on Vlan1 from LOADING to FULL, Loading Done 000179: Nov 18 16:33:58.423 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby 000180: Nov 18 16:40:03.902 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active 000181: Nov 18 16:40:06.606 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak 000182: Nov 18 16:40:23.954 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby 000183: Nov 18 16:45:55.761 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active 000184: Nov 18 16:45:56.293 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak 000185: Nov 18 16:46:12.905 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby 000186: Nov 18 17:24:26.103 cet: %SYS-5-CONFIG_I: Configured from console by maggiore on vty0 (172.16.0.19) 000187: Nov 18 17:31:27.119 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1 on Vlan1 from FULL to DOWN, Neighbor Down: Dead timer expired 000188: Nov 18 17:31:27.159 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1 on Vlan1 from LOADING to FULL, Loading Done 000189: Nov 18 18:00:11.150 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active 000190: Nov 18 18:00:12.907 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak 000191: Nov 18 18:00:27.934 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby 000192: Nov 18 18:38:00.799 cet: %SEC_LOGIN-4-LOGIN_FAILED: Login failed [user: maggiore] [Source: 172.16.0.19] [localport: 22] [Reason: Login Authentication Failed] at 18:38:00 cet Fri Nov 18 2011 000193: Nov 18 18:41:25.708 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active 000194: Nov 18 18:41:27.304 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak 000195: Nov 18 18:41:42.771 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Speak -> Standby

I have read a lot of docs and about troubleshooting. I have STP disabled. R3 and R1 are linked via a trasparent wi-fi bridge. In the center of my network there is a Cisco 3550 switch, but both routers are in the same VLAN.

Here the standby output:

R1:

gw1.wisp#sh stand Vlan1 - Group 1 (version 2) State is Active 1 state change, last state change 02:28:27 Virtual IP address is 172.16.0.1 Active virtual MAC address is 0000.0c9f.f001 Local virtual MAC address is 0000.0c9f.f001 (v2 default) Hello time 5 sec, hold time 15 sec Next hello sent in 1.600 secs Preemption enabled Active router is local Standby router is 172.16.0.243, priority 95 (expires in 13.440 sec) Priority 100 (default 100) Track object 1 state Up decrement 10 Track object 2 state Up decrement 10 Group name is "R1A+R3P" (cfgd) Vlan1 - Group 3 (version 2) State is Standby 1 state change, last state change 02:27:54 Virtual IP address is 172.16.0.3 Active virtual MAC address is 0000.0c9f.f003 Local virtual MAC address is 0000.0c9f.f003 (v2 default) Hello time 5 sec, hold time 15 sec Next hello sent in 3.792 secs Preemption enabled, delay min 20 secs, reload 20 secs, sync 10 secs Active router is 172.16.0.243, priority 100 (expires in 13.392 sec) MAC address is 001c.b1ed.c2b7 Standby router is local Priority 95 (configured 95) Track object 1 state Up decrement 10 Group name is "R1P+R3A" (cfgd)

R3:

gw3.wisp#sh stand Vlan1 - Group 1 (version 2) State is Standby 37 state changes, last state change 00:08:31 Virtual IP address is 172.16.0.1 Active virtual MAC address is 0000.0c9f.f001 Local virtual MAC address is 0000.0c9f.f001 (v2 default) Hello time 5 sec, hold time 15 sec Next hello sent in 0.016 secs Preemption enabled, delay min 20 secs, reload 20 secs, sync 10 secs Active router is 172.16.0.241, priority 100 (expires in 14.720 sec) MAC address is ecc8.82cd.0806 Standby router is local Priority 95 (configured 95) Track object 1 state Up decrement 10 Group name is "R1A+R3P" (cfgd) Vlan1 - Group 3 (version 2) State is Active 1 state change, last state change 06:27:44 Virtual IP address is 172.16.0.3 Active virtual MAC address is 0000.0c9f.f003 Local virtual MAC address is 0000.0c9f.f003 (v2 default) Hello time 5 sec, hold time 15 sec Next hello sent in 4.112 secs Preemption enabled Active router is local Standby router is 172.16.0.241, priority 95 (expires in 10.560 sec) Priority 100 (default 100) Track object 1 state Up decrement 10 Track object 2 state Up decrement 10 Group name is "R1P+R3A" (cfgd)

How can I avoid R3 to lose the partner? when it happens it says that in Group 1 the standby is uknown...

Reply to
Elia S.
Loading thread data ...
002702: Nov 19 09:46:57.178 cet: HSRP: Vl1 Grp 100 Hello in 172.16.0.241 Standby pri 95 vIP 172.16.0.10 002703: Nov 19 09:46:57.378 cet: HSRP: Vl1 Grp 3 Hello out 172.16.0.243 Active pri 100 vIP 172.16.0.3 002704: Nov 19 09:46:58.674 cet: HSRP: Vl1 Grp 1 Hello out 172.16.0.243 Standby pri 95 vIP 172.16.0.1 002705: Nov 19 09:46:59.102 cet: HSRP: Vl1 Grp 100 Hello in 172.16.0.11 Active pri 100 vIP 172.16.0.10 002706: Nov 19 09:46:59.899 cet: HSRP: Vl1 Grp 100 Hello in 172.16.0.241 Standby pri 95 vIP 172.16.0.10 002707: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Standby: c/Active timer expired (172.16.0.241) 002708: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Active router is local, was 172.16.0.241 002709: Nov 19 09:47:00.355 cet: HSRP: Vl1 Nbr 172.16.0.241 no longer active for group 1 (Standby) 002710: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Standby router is unknown, was local 002711: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Standby -> Active 002712: Nov 19 09:47:00.355 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Standby -> Active 002713: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Redundancy "R1A+R3P" state Standby -> Active 002714: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Hello out 172.16.0.243 Active pri 95 vIP 172.16.0.1 002715: Nov 19 09:47:00.359 cet: HSRP: Vl1 Grp 1 Added 172.16.0.1 to ARP (001c.b1ed.c2b7) 002716: Nov 19 09:47:00.359 cet: HSRP: Vl1 Grp 1 Activating MAC 001c.b1ed.c2b7 002717: Nov 19 09:47:00.359 cet: HSRP: Vl1 IP Redundancy "R1A+R3P" standby, local -> unknown 002718: Nov 19 09:47:00.359 cet: HSRP: Vl1 IP Redundancy "R1A+R3P" update, Standby -> Active 002719: Nov 19 09:47:00.367 cet: HSRP: Vl1 Grp 1 Hello in 172.16.0.241 Active pri 100 vIP 172.16.0.1 002720: Nov 19 09:47:00.367 cet: HSRP: Vl1 Grp 1 Active router is 172.16.0.241, was local 002721: Nov 19 09:47:00.367 cet: HSRP: Vl1 Nbr 172.16.0.241 active for group 1 002722: Nov 19 09:47:00.367 cet: HSRP: Vl1 Grp 1 Active: g/Hello rcvd from higher pri Active router (100/172.16.0.241) 002723: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Active -> Speak 002724: Nov 19 09:47:00.371 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state Active -> Speak 002725: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Redundancy "R1A+R3P" state Active -> Speak 002726: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Resign out 172.16.0.243 Speak pri 95 vIP 172.16.0.1 002727: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Removed 172.16.0.1 from ARP 002728: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Deactivating MAC 001c.b1ed.c2b7 002729: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Hello out 172.16.0.243 Speak pri 95 vIP 172.16.0.1 002730: Nov 19 09:47:00.375 cet: HSRP: Vl1 IP Redundancy "R1A+R3P" update, Active -> Speak 002731: Nov 19 09:47:00.383 cet: HSRP: Vl1 Grp 1 Coup in 172.16.0.241 Active pri 100 vIP 172.16.0.1 002732: Nov 19 09:47:00.383 cet: HSRP: Vl1 Grp 1 Hello in 172.16.0.241 Active pri 100 vIP 172.16.0.1 002733: Nov 19 09:47:01.819 cet: HSRP: Vl1 Grp 3 Hello out 172.16.0.243 Active pri 100 vIP 172.16.0.3
Reply to
Elia S.

Is there really an R2? Or did you mean R3?

Not sure what the answer to the question you asked is, but generally speaking I would Keep It Simple, Stupid and only bother with a track icmp ping to something reliable on the internet, eg 8.8.8.8. And why you have got two HSRP groups [or even three in your second message]? Again, simplify things when trying to troubleshoot.

In my experience, Cisco DSL interfaces can get to a point where the noise margin is negligible or even negative but don't resync, so you end up with unusable levels of packet loss, yet the ATM interface is still up according to the router. A quick shut/no shut on the ATM forces it to resync, and away you go for a few more days/weeks. So in this case tracking the state of ATM0 wouldn't be much use, as it can be up but not usable.

Something else I've seen is where there is a connection failure upstream of the exchange [eg where your provider breaks out to the internet], in which case the ISP next hop will be pingable, but you won't get to the internet. These are rarer occurrences, but again tracking the next-hop doesn't tell you much about the usability of the connection, unless you're only interested in destinations inside the ISPs network [eg site-to-site VPN tunnel that's on-net].

Back on topic, my guess would be that there is a small amount of packet loss on the wireless bridge between the routers, and it's enough to cause HSRP to think its peer has gone away. I expect you can tweak the HSRP parameters to account for this. You could also create a track on each router to track each other's LAN IPs and see what sort of packet loss you get.

Reply to
alexd

Hello alexd, thank you for your answer. sorry for the confusion I have R1 and R3

group 1 and group 3

I think I figured it out. the OS of the wireless bridges has some troubles handling multicast traffic, so it could drop packets.

at the moment I have put hello timers to 5 60 and it seem stable.

"alexd" ha scritto nel messaggio news: snipped-for-privacy@ale.cx...

Back on topic, my guess would be that there is a small amount of packet loss on the wireless bridge between the routers, and it's enough to cause HSRP to think its peer has gone away. I expect you can tweak the HSRP parameters to account for this. You could also create a track on each router to track each other's LAN IPs and see what sort of packet loss you get.

Reply to
Elia S.

After an accurate analysis and an afternoon spent in front of a terminal line with debug enabled.............

R3 receives and sends hello about HSRP group 3 R3 sends but NOT receivs hello about HSRP group 1

R1 send and receives all hellos about HSRP group 1 and 3

.......

I rebooted the core switch (c3550-24-EMI) and it was ok.

another question:

How can I track a remote IP using a sla? I know how to do it (it is included in the previous messages) but how can I set to:

ping the remote host, if 4 pings in a row fail, assume that the remote host is down? I could set that.

I have seen that occasionally one ping could be lost in the net..

Reply to
Elia S.

You create a ip sla monitor to measure something:

formatting link

and then you can do something based on the results, eg change the default gateway:

ip route 0.0.0.0 0.0.0.0 192.168.1.3 20 track 10

If you don't actually want to do anything, you can just have it log changes, or you can even poll the status with SNMP.

You said a wireless bridge, you're much more likely to see packet loss on wireless than wired.

Reply to
alexd

Hello Alex.

I would like to monitor a remote IP address (via the ATM0.1 line, not wi-fi), but mark it as down after 4 lost ping. This config I am not able to do. Can you help me?

Reply to
Elia S.

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.