Up->Down messages related to ip sla icmp-echo when there are no apparent network issues

- T
- ttripp
  
  Contact options for registered users
posted
14 years ago

Wed, Apr 29, 2009 3:58 PM

I am trying to use ip sla with icmp-echo to control the entry of a static route in the routing table. Currently I am using sla for controling two routes, with more to come. My sla configuration looks like this:

ip sla monitor 5 type echo protocol ipIcmpEcho 10.120.26.5 source-ipaddr 10.120.26.60 frequency 15 ip sla monitor reaction-configuration 5 threshold-falling 5000 threshold-type consecutive

3 ip sla monitor schedule 5 life forever start-time now ip sla monitor 12 type echo protocol ipIcmpEcho 10.120.26.12 source-ipaddr 10.120.26.60 frequency 15 ip sla monitor reaction-configuration 12 threshold-falling 5000 threshold-type consecutive 3 ip sla monitor schedule 12 life forever start-time now

! Lines ommited

track 5 rtr 5 reachability ! track 12 rtr 12 reachability

! Lines omitted

ip route 10.100.30.0 255.255.255.0 10.120.26.5 track 5 ip route 10.1.3.12 255.255.255.255 10.120.26.12 track 12

As I understand it, the way I have this configured the router should ping the two configured ip addresses every 15 seconds, and if the 5000msec threshold is surpassed three consecutive times, then reachability is considered "down" and the router is removed from my table.

The problem is that I get the following log messages at random times:

04-29-2009 10:00:23 Paris_-_WAN-PARIS-1_(10.60.10.1) 182: *Apr 29 14:13:00: %TRACKING-5-STATE: 12 rtr 12 reachability Down->Up 04-29-2009 10:00:13 Paris_-_WAN-PARIS-1_(10.60.10.1) 181: *Apr 29 14:12:50: %TRACKING-5-STATE: 12 rtr 12 reachability Up->Down

However, monitoring from PRTG shows the ping response times for both of these ip addresses is in the 100-200msec time, which shouldn't be nearly enough to trigger a Down->Up.

Also, each Down->Up is followed exactly 10 seconds later by an Up-

Down. This

happens each and every time. I have four other routers configured the same way and they also have the same issue, and the same 100-200msec ping time, and the same 10 second Down->Up/Up->Down cycle.

Bandwidth utilization on the circuits these pings are going out on varies, but the error is generated even when traffic is nill.

So, has anyone seen anything like this before? Thanks.

- B
- bod43
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Apr 30, 2009 4:59 AM

The Down->Up seems back to front to me? Surely it is up normally and needs to go Up-> down first?

You might consider

debug ip sla event debug ip sla error

to see what the router thinks it is doing.

These will generate quite a bit of output over time.

- T
- Thrill5
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Apr 30, 2009 5:30 AM

I believe that the problem is the you have configured a 5 second timeout, and your only polling every 15 seconds. The falling-threshold of 5000 msec (5 seconds) means that if the last successful ping occured more than 5 seconds ago, the sla is down. This number is NOT how long it should wait for a response.

- T
- ttripp
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, May 1, 2009 12:58 PM

Thanks. I opened a ticket with TAC and they said the same thing, basically. It's fixed now.