OSPF Retransmits - symptom or normal?

Hi all,

I'm seeing a number of ospfTxRetransmit SNMP traps from a 4506 and I was wondering if they're a symptom of an underlying problem, or just part of normal operation?

There's a pair of 4506s, (call them 'A' and 'B'), equipped with a Sup-V

10GE each. They're connected together via 10Gb X2 modules (single 10Gb link between them). 'A' is also connected via a 1Gb link to a corporate network via 'C' (a 3550-12T).

I'm seeing ospfTxRetransmit traps (OID: mib-2.14.16.0.13) from 'A', but I don't understand why.

I've checked the physical status of the interfaces and they're clean (100% clean and utilization < 1/255 as you'd expect). The retransmits are always from 'A' to 'B'. I've switched on "debug ip ospf retransmits" on 'A', but am not seeing any debug output in the log.

IOS release on the 4506s is 12.2(25)EWA3.

Are these a symptom of something (and if so, any suggestions where to look next)? Or, are they simply a normal part of OSPF operation?

Many TIA Steve

Reply to
StivH
Loading thread data ...

Steve,

How are your OSPF areas configured, and what are their relationship with the corporate network? It would help if we understand a bit more on the relationship between "A" and "B" and "C". Are they in different areas?

Complete configs from both would be a good start.

Simon

Reply to
simonychang

Post the output of show ip ospf neighbor detail for the devices that is experiencing OSPF retransmits along with show version and show ip traffic.

Configure 'logging buffer xxxxx debug" to see the debug output in the internal syslogging buffer.

Reply to
Merv

Hi Simon,

All in OSPF Area 0. Excerpts from complete configs follow (I've snipped the irrelevant bits to save space - let me know if there's any other bits you'd like to see).

TIA Steve Switch A Config =============== version 12.2 hw-module uplink select tengigabitethernet udld enable ! vlan 144 name Vlan_144 ! vlan 145 name Vlan_145 ! interface Loopback0 ip address 172.31.52.11 255.255.255.255 ! interface TenGigabitEthernet1/2 description ** Switch B (4506) 10Gb link ** no switchport bandwidth 10000000 ip address 172.31.1.13 255.255.255.252 ip ospf cost 1 udld port ! interface GigabitEthernet2/6 description ** link to legacy network (3550-12T) ** no switchport ip address 172.31.248.137 255.255.255.252 media-type rj45 speed 1000 duplex full ! interface Vlan1 no ip address ! interface Vlan144 ip address 172.31.144.3 255.255.255.0 standby 144 ip 172.31.144.1 standby 144 timers msec 50 msec 150 standby 144 priority 110 standby 144 preempt delay minimum 180 ! interface Vlan145 ip address 172.31.145.3 255.255.255.0 standby 145 ip 172.31.145.1 standby 144 timers msec 50 msec 150 standby 145 priority 90 standby 145 preempt delay minimum 180 ! router ospf 1 log-adjacency-changes passive-interface default no passive-interface TenGigabitEthernet1/2 no passive-interface GigabitEthernet2/6 network 172.31.0.0 0.0.255.255 area 0 ! ip ospf name-lookup ! logging trap debugging ! snmp-server enable traps ospf state-change snmp-server enable traps ospf errors snmp-server enable traps ospf retransmit snmp-server enable traps ospf lsa snmp-server enable traps ospf cisco-specific state-change snmp-server enable traps ospf cisco-specific errors snmp-server enable traps ospf cisco-specific retransmit snmp-server enable traps ospf cisco-specific lsa

Switch B config =============== version 12.2 hw-module uplink select tengigabitethernet udld enable ! vlan 144 name Vlan_144 ! vlan 145 name Vlan_145 ! interface Loopback0 ip address 172.31.52.12 255.255.255.255 ! interface TenGigabitEthernet1/2 description ** Switch A (4506) ** no switchport bandwidth 10000000 ip address 172.31.1.14 255.255.255.252 ip ospf cost 1 udld port ! interface Vlan1 no ip address ! interface Vlan144 ip address 172.31.144.2 255.255.255.0 standby 144 ip 172.31.144.1 standby 144 timers msec 50 msec 150 standby 144 priority 90 ! interface Vlan145 ip address 172.31.145.2 255.255.255.0 standby 145 ip 172.31.145.1 standby 145 timers msec 50 msec 150 standby 145 priority 110 standby 145 preempt delay minimum 180 ! router ospf 1 log-adjacency-changes passive-interface default no passive-interface TenGigabitEthernet1/2 network 172.31.0.0 0.0.255.255 area 0 ! ip ospf name-lookup ! logging trap debugging ! snmp-server enable traps ospf state-change snmp-server enable traps ospf errors snmp-server enable traps ospf retransmit snmp-server enable traps ospf lsa snmp-server enable traps ospf cisco-specific state-change snmp-server enable traps ospf cisco-specific errors snmp-server enable traps ospf cisco-specific retransmit snmp-server enable traps ospf cisco-specific lsa

Reply to
StivH

Hi Merv,

Here's the output. I have got 'logging buffer 16384 debug' configured, but there's nothing appearing in the log

Cheers Steve

Switch_A>sh ip ospf neigh 172.31.52.12 detail Neighbor Switch_B, interface address 172.31.1.14 In the area 0 via interface TenGigabitEthernet1/2 Neighbor priority is 1, State is FULL, 6 state changes DR is 172.31.1.14 BDR is 172.31.1.13 Options is 0x52 LLS Options is 0x1 (LR) Dead timer due in 00:00:33 Neighbor is up for 1d15h Index 2/2, retransmission queue length 0, number of retransmission

13 First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0) Last retransmission scan length is 3, maximum is 6 Last retransmission scan time is 0 msec, maximum is 0 msec Switch_A>sho ver Cisco IOS Software, Catalyst 4000 L3 Switch Software (cat4000-I5K91S-M), Version 12.2(25)EWA3, RELEASE SOFTWARE (fc1) Technical Support:
formatting link
(c) 1986-2005 by Cisco Systems, Inc. Compiled Tue 23-Aug-05 13:41 by dchih Image text-base: 0x10000000, data-base: 0x115ECC90

ROM: 12.2(25r)EW Pod Revision 14, Force Revision 31, Tie Revision 29

Switch_A uptime is 2 weeks, 5 days, 17 hours, 53 minutes System returned to ROM by power-on System restarted at 17:11:26 bst Fri Oct 21 2005 System image file is "bootflash:cat4000-i5k91s-mz.122-25.EWA3.bin"

This product contains cryptographic features and is subject to United States and local country laws governing import, export, transfer and use. Delivery of Cisco cryptographic products does not imply third-party authority to import, export, distribute or use encryption. Importers, exporters, distributors and users are responsible for compliance with U.S. and local country laws. By using this product you agree to comply with applicable laws and regulations. If you are unable to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:

formatting link
If you require further assistance please contact us by sending email to snipped-for-privacy@cisco.com.

cisco WS-C4506 (MPC8540) processor (revision 10) with 524288K bytes of memory. Processor board ID FOX0932054R MPC8540 CPU at 800Mhz, Supervisor V-10GE Last reset from PowerUp

7 Virtual Ethernet interfaces 22 Gigabit Ethernet interfaces 2 Ten Gigabit Ethernet interfaces 511K bytes of non-volatile configuration memory.

Configuration register is 0x2101

Switch_A>sh ip traffic IP statistics: Rcvd: 6324826 total, 5724822 local destination 0 format errors, 0 checksum errors, 0 bad hop count 0 unknown protocol, 3 not a gateway 0 security failures, 0 bad options, 0 with options Opts: 0 end, 0 nop, 0 basic security, 0 loose source route 0 timestamp, 0 extended security, 0 record route 0 stream ID, 0 strict source route, 0 alert, 0 cipso, 0 ump 0 other Frags: 600000 reassembled, 0 timeouts, 0 couldn't reassemble 1200000 fragmented, 0 couldn't fragment Bcast: 18 received, 6 sent Mcast: 4593250 received, 1853258 sent Sent: 3893740 generated, 25 forwarded Drop: 7 encapsulation failed, 0 unresolved, 0 no adjacency 0 no route, 0 unicast RPF, 0 forced drop 0 options denied, 0 source IP address zero

ICMP statistics: Rcvd: 0 format errors, 0 checksum errors, 0 redirects, 1859 unreachable 23 echo, 621032 echo reply, 0 mask requests, 0 mask replies, 0 quench 0 parameter, 0 timestamp, 0 info request, 0 other 0 irdp solicitations, 0 irdp advertisements Sent: 0 redirects, 126245 unreachable, 621033 echo, 23 echo reply 0 mask requests, 0 mask replies, 0 quench, 0 timestamp 0 info reply, 0 time exceeded, 0 parameter problem 0 irdp solicitations, 0 irdp advertisements

TCP statistics: Rcvd: 351172 total, 0 checksum errors, 6 no port Sent: 662226 total

UDP statistics: Rcvd: 3157437 total, 0 checksum errors, 127671 no port Sent: 270368 total, 0 forwarded broadcasts

Probe statistics: Rcvd: 0 address requests, 0 address replies 0 proxy name requests, 0 where-is requests, 0 other Sent: 0 address requests, 0 address replies (0 proxy) 0 proxy name replies, 0 where-is replies

BGP statistics: Rcvd: 0 total, 0 opens, 0 notifications, 0 updates 0 keepalives, 0 route-refresh, 0 unrecognized Sent: 0 total, 0 opens, 0 notifications, 0 updates 0 keepalives, 0 route-refresh

EGP statistics: Rcvd: 0 total, 0 format errors, 0 checksum errors, 0 no listener Sent: 0 total

OSPF statistics: Rcvd: 1746193 total, 0 checksum errors 1440165 hello, 1067 database desc, 886 link state req 96708 link state updates, 54233 link state acks

Sent: 1613852 total

IP-EIGRP statistics: Rcvd: 0 total Sent: 0 total

PIMv2 statistics: Sent/Received Total: 0/0, 0 checksum errors, 0 format errors Registers: 0/0 (0 non-rp, 0 non-sm-group), Register Stops: 0/0, Hellos: 0/0 Join/Prunes: 0/0, Asserts: 0/0, grafts: 0/0 Bootstraps: 0/0, Candidate_RP_Advertisements: 0/0 State-Refresh: 0/0

IGMP statistics: Sent/Received Total: 0/0, Format errors: 0/0, Checksum errors: 0/0 Host Queries: 0/0, Host Reports: 0/0, Host Leaves: 0/0 DVMRP: 0/0, PIM: 0/0

ARP statistics: Rcvd: 137 requests, 295 replies, 0 reverse, 0 other Sent: 156 requests, 349 replies (18 proxy), 0 reverse Drop due to input queue full: 0 Switch_A>

Reply to
StivH

I would not worry about 13 retransmissions over 2 weeks 5 days.

Reply to
Merv

This is part of what's bothering me, though. The show ip ospf neighbor detail output says 13 retransmissions, and the sysUptime is almost 3 weeks, but I've actually received ~ 60 SNMP ospfTxRetransmit traps in the last week, and that's after I delete any traps that I can explain (eg changing passive interface settings, or adjusting hello interval and dead interval timers cause OSPF disruption). It would be nice to know which to believe (and the reason for the discrepancy).

Thanks for your input, though

Steve

Reply to
StivH

I would also want to investigate the descrepancy.

How many OSPF neighbours does Switch_A have ?

Please post sh ip ospf neigh detail for each neighbor

What information is contained in the SNMP trap message - please post an example.

Also post the output of show snmp

Reply to
Merv

Hi Merv,

Switch_A has 2 OSPF neighbours. Detail for Switch_B (other 4506) follows:

sh ip ospf nei 172.31.52.12 detail Neighbor Switch_B, interface address 172.31.1.14 In the area 0 via interface TenGigabitEthernet1/2 Neighbor priority is 1, State is FULL, 6 state changes DR is 172.31.1.14 BDR is 172.31.1.13 Options is 0x52 LLS Options is 0x1 (LR) Dead timer due in 00:00:31 Neighbor is up for 18:55:58 Index 2/2, retransmission queue length 0, number of retransmission

21 First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0) Last retransmission scan length is 1, maximum is 30 Last retransmission scan time is 0 msec, maximum is 0 msec

Detail for Switch_C (3550-12T): sh ip ospf nei Switch_C det Neighbor Switch_C, interface address 172.31.248.138 In the area 0 via interface GigabitEthernet2/6 Neighbor priority is 1, State is FULL, 6 state changes DR is 172.31.248.138 BDR is 172.31.248.137 Options is 0x42 Dead timer due in 00:00:31 Neighbor is up for 2d15h Index 1/1, retransmission queue length 0, number of retransmission

0 First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0) Last retransmission scan length is 0, maximum is 0 Last retransmission scan time is 0 msec, maximum is 0 msec

show snmp output: sh snmp Chassis: FOX0932054R Contact: x3176 Location: A

98 SNMP packets input 0 Bad SNMP version errors 0 Unknown community name 0 Illegal operation for community name supplied 0 Encoding errors 98 Number of requested variables 0 Number of altered variables 0 Get-request PDUs 98 Get-next PDUs 0 Set-request PDUs 7608 SNMP packets output 0 Too big errors (Maximum packet size 1500) 0 No such name errors 0 Bad values errors 0 General errors 98 Response PDUs 7510 Trap PDUs SNMP agent enabled

SNMP logging: enabled Logging to 172.31.63.10.162, 0/10, 1502 sent, 0 dropped. Logging to 172.31.63.11.162, 0/10, 1502 sent, 0 dropped. Logging to 172.31.63.12.162, 0/10, 1502 sent, 0 dropped. Logging to 172.31.63.21.162, 0/10, 1502 sent, 0 dropped. Logging to 172.31.63.22.162, 0/10, 1502 sent, 0 dropped.

Typical SNMP trap (captured by Net-SNMP 5.2.1 and piped into PHP script which just dumps stdin to email): Switch_A.bayeruk.net

172.31.52.11 system.sysUpTime 6:22:32:52.15 .iso.org.dod.internet.snmpV2.snmpModules.snmpMIB.snmpMIBObjects.snmpTrap.snmpTrapOID 14.16.0.10 14.1.1 -1407241205 14.7.1.1 -1407254259 14.7.1.2 0 14.10.1.3 -1407241204 14.16.1.3 4 14.4.1.2 3 14.4.1.3 168036736 14.4.1.4 -1407190895 .iso.org.dod.internet.snmpV2.snmpModules.snmpMIB.snmpMIBObjects.snmpTrap.snmpTrapEnterprise 14.16

And finally, here's a typical trap that I wrote some decoding for as the numeric OIDs aren't very friendly: OSPF RouterID: 172.31.52.11 OSPF Interface IP Address: 172.31.1.13 OSPF Neighbour RouterID: 172.31.52.12 OSPF Packet Type: lsUpdate OSPF Link Type: asxternalLink OSPF Link ID: 10.23.81.0 OSPF Originating Router ID: 172.31.248.241 This email brought to you by trap.ospfTxRetransmit.php

btw, although the retransmit count from the show ip ospf neighbor detail command only appears to have gone up by 8 since yesterday, I've received 18 new notifications

Cheers Steve

Reply to
StivH

Switch A has lost adjacecny with Switch B 6 times in the last 2 weeks. I would suggest focusing on finding the cause of this issue first.

I see that you have enabled OSPF neighbour loss logging Please post these log messages for Switch A and Switch B.

Please also post show process cpu and show interface switching for both Switch A and Switch B

Reply to
Merv

Neighbor uptime is an issue !!!

sh ip ospf nei 172.31.52.12 detail Neighbor Switch_B, interface address 172.31.1.14 In the area 0 via interface TenGigabitEthernet1/2 Neighbor priority is 1, State is FULL, 6 state changes DR is 172.31.1.14 BDR is 172.31.1.13 Options is 0x52 LLS Options is 0x1 (LR) Dead timer due in 00:00:31 Neighbor is up for 18:55:58

Reply to
Merv

Hi Merv,

Pretty sure that the neighbour uptime is a red herring. It's because of actions that I took whilst troubleshooting resulting in dropping and re-establishing the adjacency (initially I set the hello and dead timers quite aggressively - after I started seeing these retransmits, I returned them to defaults).

I did another sh ip ospf nei det this morning, having receieved another

30 traps since yesterday evening (output follows):

Switch_A>sh ip ospf nei 172.31.52.12 det Neighbor 172.31.52.12, interface address 172.31.1.14 In the area 0 via interface TenGigabitEthernet1/2 Neighbor priority is 1, State is FULL, 6 state changes DR is 172.31.1.14 BDR is 172.31.1.13 Options is 0x52 LLS Options is 0x1 (LR) Dead timer due in 00:00:33 Neighbor is up for 4d17h Index 2/2, retransmission queue length 0, number of retransmission

69 First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0) Last retransmission scan length is 1, maximum is 30 Last retransmission scan time is 0 msec, maximum is 0 msec

As you can see the neighbour uptime is consistent with what it should be (previous post was 11th November 10:07 am and the neighbour uptime was 19 hours or so, indicating that the last transition would have been

10th November 15:12).

At least the number of retransmissions is also increasing now (albeit faster than the number of traps received now!)

As these switches are not in production, I'm going to reload them so that this misleading information is no longer there.

Thanks for your help so far

Steve

Reply to
StivH

Had a thought about the "lost adjacency ... 6 times" bit, and I don't think that's true. OSPF adjacencies don't only have 2 states. When a new adjacency is formed (eg switch on new router), the adjacency doesn't just go "down" to "full", but transits through Init, TwoWay, ExStart, Exchange then Full) (might have got the exact states wrong - been a long time since I touched OSPF). So, 6 state changes don't indicate 6 transitions from "down" to "full", rather 1 transition through each of 6 states.

Steve

Reply to
StivH

After you reload the switches and if retransmissions are occuring from switch A to B, then I would think that indicates that there is a problem on switch B.

That being the case, then troubleshooting should focus on why switch B is not acking some of A transmissions.

Reply to
Merv

OK, done some more thinking about the discrepancy between the OSPF retransmit count and the SNMP ospfTxRetransmit traps received. OSPF LSAs can contain adverts for multiple LSIDs, whereas the SNMP trap only tells you that an LSA for 1 LSID was retransmitted. So, when an LSA is not Acked, that LSA will be retransmitted, but there will be an SNMP trap for each LSID contained in the LSA. I think that makes sense, anyway :-\\

So, the discrepancy is not such an issue to me. However, this is LSAs not Acked when transmitted across an empty 10Gbps link between two idle

4506s with SupV-10GEs in them - I can't understand what the 4506s are doing that's causing this?

I've swapped the SupV-10GE from Switch_B with a spare, so will see what impact that has.

Reply to
StivH

It will be interesting to see what you find out on switch B

Reply to
Merv

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.