Absurd problem with HSRP

Have a question or want to start a discussion? Post it! No Registration Necessary.  Now with pictures!

Threaded View
Hello

I have this scenario:

R1 with DSL line
R2 with DSL line

Two HSRP Groups in both routers.
group 1: virtual IP 172.16.0.1
group 3: virtual IP 172.16.0.3

Each router has two tracking systems,
a) tracks ATM0
b) tracks icmp ping to a remote host (next-hop ISP IP)

Actually I have two Cisco 877 routers:

R1:
172.16.0.241/23
IOS 12.4(24)T4 with 128mb DRAM


R3:
172.16.0.243/23
IOS 15.1(4)M2 with 256mb DRAM

Here are the confs:

R1:
INT VLAN1:
ip address 172.16.0.241 255.255.254.0
no ip redirects
no ip proxy-arp
ip nat inside
standby delay minimum 30 reload 20
standby version 2
standby 1 ip 172.16.0.1
standby 1 timers 5 15
standby 1 preempt
standby 1 name R1A+R3P
standby 1 track 1 decrement 10
standby 1 track 2 decrement 10

standby 3 ip 172.16.0.3
standby 3 timers 5 15
standby 3 priority 95
standby 3 preempt delay minimum 20 reload 20 sync 10
standby 3 name R1P+R3A
standby 3 track 1 decrement 10

R3:
INT VLAN1:
ip address 172.16.0.243 255.255.254.0
no ip redirects
no ip proxy-arp
ip nat inside
no ip virtual-reassembly in
standby delay minimum 30 reload 20
standby version 2
standby 1 ip 172.16.0.1
standby 1 timers 5 15
standby 1 priority 95
standby 1 preempt delay minimum 20 reload 20 sync 10
standby 1 name R1A+R3P
standby 1 track 1 decrement 10
standby 3 ip 172.16.0.3
standby 3 timers 5 15
standby 3 preempt
standby 3 name R1P+R3A
standby 3 track 1 decrement 10
standby 3 track 2 decrement 10


The router works as the same.
I also have OSPF on the network.

Suddenly, seldom happens that R3 lost connection to the hsrp partner and
drops OSPF but no network outages occurs.
I replaced R1 from a Cisco 837, to a 2610XM, then to a 877 but the problem
remains.

This happens in the syslog of R3.

000167: Nov 18 16:11:03.962 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Active -> Speak
000168: Nov 18 16:11:20.758 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Speak -> Standby
000169: Nov 18 16:17:19.275 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Standby -> Active
000170: Nov 18 16:17:34.482 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1
on Vlan1 from FULL to DOWN, Neighbor Down: Dead timer expired
000171: Nov 18 16:17:42.232 cet: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.0.241
on Vlan1 from 2WAY to DOWN, Neighbor Down: Dead timer expired
000172: Nov 18 16:18:12.019 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1
on Vlan1 from LOADING to FULL, Loading Done
000173: Nov 18 16:18:19.088 cet: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.0.241
on Vlan1 from LOADING to FULL, Loading Done
000174: Nov 18 16:31:43.488 cet: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.0.241
on Vlan1 from FULL to DOWN, Neighbor Down: Dead timer expired
000175: Nov 18 16:31:47.073 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1
on Vlan1 from FULL to DOWN, Neighbor Down: Dead timer expired
000176: Nov 18 16:33:37.094 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1
on Vlan1 from LOADING to FULL, Loading Done
000177: Nov 18 16:33:42.651 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Active -> Speak
000178: Nov 18 16:33:50.989 cet: %OSPF-5-ADJCHG: Process 1, Nbr 172.16.0.241
on Vlan1 from LOADING to FULL, Loading Done
000179: Nov 18 16:33:58.423 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Speak -> Standby
000180: Nov 18 16:40:03.902 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Standby -> Active
000181: Nov 18 16:40:06.606 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Active -> Speak
000182: Nov 18 16:40:23.954 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Speak -> Standby
000183: Nov 18 16:45:55.761 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Standby -> Active
000184: Nov 18 16:45:56.293 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Active -> Speak
000185: Nov 18 16:46:12.905 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Speak -> Standby
000186: Nov 18 17:24:26.103 cet: %SYS-5-CONFIG_I: Configured from console by
maggiore on vty0 (172.16.0.19)
000187: Nov 18 17:31:27.119 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1
on Vlan1 from FULL to DOWN, Neighbor Down: Dead timer expired
000188: Nov 18 17:31:27.159 cet: %OSPF-5-ADJCHG: Process 1, Nbr 192.168.1.1
on Vlan1 from LOADING to FULL, Loading Done
000189: Nov 18 18:00:11.150 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Standby -> Active
000190: Nov 18 18:00:12.907 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Active -> Speak
000191: Nov 18 18:00:27.934 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Speak -> Standby
000192: Nov 18 18:38:00.799 cet: %SEC_LOGIN-4-LOGIN_FAILED: Login failed
[user: maggiore] [Source: 172.16.0.19] [localport: 22] [Reason: Login
Authentication Failed] at 18:38:00 cet Fri Nov 18 2011
000193: Nov 18 18:41:25.708 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Standby -> Active
000194: Nov 18 18:41:27.304 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Active -> Speak
000195: Nov 18 18:41:42.771 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Speak -> Standby

I have read a lot of docs and about troubleshooting.
I have STP disabled.
R3 and R1 are linked via a trasparent wi-fi bridge.
In the center of my network there is a Cisco 3550 switch, but both routers
are in the same VLAN.

Here the standby output:

R1:

gw1.wisp#sh stand
Vlan1 - Group 1 (version 2)
  State is Active
    1 state change, last state change 02:28:27
  Virtual IP address is 172.16.0.1
  Active virtual MAC address is 0000.0c9f.f001
    Local virtual MAC address is 0000.0c9f.f001 (v2 default)
  Hello time 5 sec, hold time 15 sec
    Next hello sent in 1.600 secs
  Preemption enabled
  Active router is local
  Standby router is 172.16.0.243, priority 95 (expires in 13.440 sec)
  Priority 100 (default 100)
    Track object 1 state Up decrement 10
    Track object 2 state Up decrement 10
  Group name is "R1A+R3P" (cfgd)
Vlan1 - Group 3 (version 2)
  State is Standby
    1 state change, last state change 02:27:54
  Virtual IP address is 172.16.0.3
  Active virtual MAC address is 0000.0c9f.f003
    Local virtual MAC address is 0000.0c9f.f003 (v2 default)
  Hello time 5 sec, hold time 15 sec
    Next hello sent in 3.792 secs
  Preemption enabled, delay min 20 secs, reload 20 secs, sync 10 secs
  Active router is 172.16.0.243, priority 100 (expires in 13.392 sec)
    MAC address is 001c.b1ed.c2b7
  Standby router is local
  Priority 95 (configured 95)
    Track object 1 state Up decrement 10
  Group name is "R1P+R3A" (cfgd)


R3:

gw3.wisp#sh stand
Vlan1 - Group 1 (version 2)
  State is Standby
    37 state changes, last state change 00:08:31
  Virtual IP address is 172.16.0.1
  Active virtual MAC address is 0000.0c9f.f001
    Local virtual MAC address is 0000.0c9f.f001 (v2 default)
  Hello time 5 sec, hold time 15 sec
    Next hello sent in 0.016 secs
  Preemption enabled, delay min 20 secs, reload 20 secs, sync 10 secs
  Active router is 172.16.0.241, priority 100 (expires in 14.720 sec)
    MAC address is ecc8.82cd.0806
  Standby router is local
  Priority 95 (configured 95)
    Track object 1 state Up decrement 10
  Group name is "R1A+R3P" (cfgd)
Vlan1 - Group 3 (version 2)
  State is Active
    1 state change, last state change 06:27:44
  Virtual IP address is 172.16.0.3
  Active virtual MAC address is 0000.0c9f.f003
    Local virtual MAC address is 0000.0c9f.f003 (v2 default)
  Hello time 5 sec, hold time 15 sec
    Next hello sent in 4.112 secs
  Preemption enabled
  Active router is local
  Standby router is 172.16.0.241, priority 95 (expires in 10.560 sec)
  Priority 100 (default 100)
    Track object 1 state Up decrement 10
    Track object 2 state Up decrement 10
  Group name is "R1P+R3A" (cfgd)


How can I avoid R3 to lose the partner? when it happens it says that in
Group 1 the standby is uknown...


Re: Absurd problem with HSRP
002702: Nov 19 09:46:57.178 cet: HSRP: Vl1 Grp 100 Hello  in  172.16.0.241
Standby pri 95 vIP 172.16.0.10
002703: Nov 19 09:46:57.378 cet: HSRP: Vl1 Grp 3 Hello  out 172.16.0.243
Active  pri 100 vIP 172.16.0.3
002704: Nov 19 09:46:58.674 cet: HSRP: Vl1 Grp 1 Hello  out 172.16.0.243
Standby pri 95 vIP 172.16.0.1
002705: Nov 19 09:46:59.102 cet: HSRP: Vl1 Grp 100 Hello  in  172.16.0.11
Active  pri 100 vIP 172.16.0.10
002706: Nov 19 09:46:59.899 cet: HSRP: Vl1 Grp 100 Hello  in  172.16.0.241
Standby pri 95 vIP 172.16.0.10
002707: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Standby: c/Active timer
expired (172.16.0.241)
002708: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Active router is local, was
172.16.0.241
002709: Nov 19 09:47:00.355 cet: HSRP: Vl1 Nbr 172.16.0.241 no longer active
for group 1 (Standby)
002710: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Standby router is unknown,
was local
002711: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Standby -> Active
002712: Nov 19 09:47:00.355 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Standby -> Active
002713: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Redundancy "R1A+R3P" state
Standby -> Active
002714: Nov 19 09:47:00.355 cet: HSRP: Vl1 Grp 1 Hello  out 172.16.0.243
Active  pri 95 vIP 172.16.0.1
002715: Nov 19 09:47:00.359 cet: HSRP: Vl1 Grp 1 Added 172.16.0.1 to ARP
(001c.b1ed.c2b7)
002716: Nov 19 09:47:00.359 cet: HSRP: Vl1 Grp 1 Activating MAC
001c.b1ed.c2b7
002717: Nov 19 09:47:00.359 cet: HSRP: Vl1 IP Redundancy "R1A+R3P" standby,
local -> unknown
002718: Nov 19 09:47:00.359 cet: HSRP: Vl1 IP Redundancy "R1A+R3P" update,
Standby -> Active
002719: Nov 19 09:47:00.367 cet: HSRP: Vl1 Grp 1 Hello  in  172.16.0.241
Active  pri 100 vIP 172.16.0.1
002720: Nov 19 09:47:00.367 cet: HSRP: Vl1 Grp 1 Active router is
172.16.0.241, was local
002721: Nov 19 09:47:00.367 cet: HSRP: Vl1 Nbr 172.16.0.241 active for group
1
002722: Nov 19 09:47:00.367 cet: HSRP: Vl1 Grp 1 Active: g/Hello rcvd from
higher pri Active router (100/172.16.0.241)
002723: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Active -> Speak
002724: Nov 19 09:47:00.371 cet: %HSRP-5-STATECHANGE: Vlan1 Grp 1 state
Active -> Speak
002725: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Redundancy "R1A+R3P" state
Active -> Speak
002726: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Resign out 172.16.0.243
Speak   pri 95 vIP 172.16.0.1
002727: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Removed 172.16.0.1 from ARP
002728: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Deactivating MAC
001c.b1ed.c2b7
002729: Nov 19 09:47:00.371 cet: HSRP: Vl1 Grp 1 Hello  out 172.16.0.243
Speak   pri 95 vIP 172.16.0.1
002730: Nov 19 09:47:00.375 cet: HSRP: Vl1 IP Redundancy "R1A+R3P" update,
Active -> Speak
002731: Nov 19 09:47:00.383 cet: HSRP: Vl1 Grp 1 Coup   in  172.16.0.241
Active  pri 100 vIP 172.16.0.1
002732: Nov 19 09:47:00.383 cet: HSRP: Vl1 Grp 1 Hello  in  172.16.0.241
Active  pri 100 vIP 172.16.0.1
002733: Nov 19 09:47:01.819 cet: HSRP: Vl1 Grp 3 Hello  out 172.16.0.243
Active  pri 100 vIP 172.16.0.3


Re: Absurd problem with HSRP
Elia S. (for it is he) wrote:

Quoted text here. Click to load it

Is there really an R2? Or did you mean R3?

Quoted text here. Click to load it

Not sure what the answer to the question you asked is, but generally
speaking I would Keep It Simple, Stupid and only bother with a track icmp
ping to something reliable on the internet, eg 8.8.8.8. And why you have got
two HSRP groups [or even three in your second message]? Again, simplify
things when trying to troubleshoot.

In my experience, Cisco DSL interfaces can get to a point where the noise
margin is negligible or even negative but don't resync, so you end up with
unusable levels of packet loss, yet the ATM interface is still up according
to the router. A quick shut/no shut on the ATM forces it to resync, and away
you go for a few more days/weeks. So in this case tracking the state of ATM0
wouldn't be much use, as it can be up but not usable.

Something else I've seen is where there is a connection failure upstream of
the exchange [eg where your provider breaks out to the internet], in which
case the ISP next hop will be pingable, but you won't get to the internet.
These are rarer occurrences, but again tracking the next-hop doesn't tell
you much about the usability of the connection, unless you're only
interested in destinations inside the ISPs network [eg site-to-site VPN
tunnel that's on-net].

Back on topic, my guess would be that there is a small amount of packet loss
on the wireless bridge between the routers, and it's enough to cause HSRP to
think its peer has gone away. I expect you can tweak the HSRP parameters to
account for this. You could also create a track on each router to track each
other's LAN IPs and see what sort of packet loss you get.

--
 <http://ale.cx/ (AIM:troffasky) (UnSoEsNpEaTm@ale.cx)
 10:25:40 up 4 days, 11:40,  4 users,  load average: 0.09, 0.53, 0.61
 "People believe any quote they read on the internet
  if it fits their preconceived notions." - Martin Luther King


Re: Absurd problem with HSRP
Hello alexd, thank you for your answer.
sorry for the confusion
I have R1 and R3

group 1 and group 3

I think I figured it out. the OS of the wireless bridges has some troubles
handling multicast traffic, so it could drop packets.

at the moment I have put hello timers to 5 60 and it seem stable.




Back on topic, my guess would be that there is a small amount of packet loss
on the wireless bridge between the routers, and it's enough to cause HSRP to
think its peer has gone away. I expect you can tweak the HSRP parameters to
account for this. You could also create a track on each router to track each
other's LAN IPs and see what sort of packet loss you get.



Re: Absurd problem with HSRP
After an accurate analysis and an afternoon spent in front of a terminal
line with debug enabled.............

R3 receives and sends hello about HSRP group 3
R3 sends but NOT receivs hello about HSRP group 1

R1 send and receives all hellos about HSRP group 1 and 3

.......

I rebooted the core switch (c3550-24-EMI) and it was ok.


another question:

How can I track a remote IP using a sla? I know how to do it (it is included
in the previous messages) but how can I set to:

ping the remote host, if 4 pings in a row fail, assume that the remote host
is down? I could set that.

I have seen that occasionally one ping could be lost in the net..



Re: Absurd problem with HSRP
Elia S. (for it is he) wrote:

Quoted text here. Click to load it

You create a ip sla monitor to measure something:

http://www.cisco.com/en/US/docs/ios/12_4/ip_sla/configuration/guide/hsicmp.html

Quoted text here. Click to load it

and then you can do something based on the results, eg change the default
gateway:

ip route 0.0.0.0 0.0.0.0 192.168.1.3 20 track 10

If you don't actually want to do anything, you can just have it log changes,
or you can even poll the status with SNMP.

Quoted text here. Click to load it

You said a wireless bridge, you're much more likely to see packet loss on
wireless than wired.

--
 <http://ale.cx/ (AIM:troffasky) (UnSoEsNpEaTm@ale.cx)
 09:44:23 up 16:44,  3 users,  load average: 1.56, 1.11, 0.65
 "People believe any quote they read on the internet
  if it fits their preconceived notions." - Martin Luther King


Re: Absurd problem with HSRP
Hello Alex.

I would like to monitor a remote IP address (via the ATM0.1 line, not
wi-fi), but mark it as down after 4 lost ping. This config I am not able to
do.
Can you help me?



Site Timeline