input error - type overrun

Cisco states the following about input errors of the overrun variety: Description: The number of times the receiver hardware was unable to hand received data to a hardware buffer. Common Cause: The input rate of traffic exceeded the receiver's ability to handle the data.

Are there other causes that people have seen for overrun errors in general?

Now I'll get more specific. I have a POS port adapter installed (alone) in a VIP2-50 (128MB SDRAM 8MB SRAM) in a 7513 with an RSP8 (256MB SDRAM). I'm running IOS (tm) RSP Software (RSP-PV-M), Version

12.3(6), RELEASE SOFTWARE (fc3).

This router has a twin and they are providing internet access for a small hosting company. They each connect to a pair of 6509's with a GEIP+ and a GEIP. Both routers currently take full route views from our upstream providers. They then BGP peer with each other across a GEIP link. The current highwater mark has us using roughly 35% to 40% of a single Cbus.

This POS port adapter is accumulating overrun errors. It seems to happen in bursts. These are not tied to excessively high utilization of the OC3 circuit. They are not accompanied by any other type of error or drop.

A little history. Not too long ago I had an RSP4 and only 4MB SRAM in the VIP2-50 that houses the POS card. I upgraded the packet memory first, largely because I misunderstood where MEMD actually lived, but the packet memory I figure is helpful for RX side buffering anyway. When that didn't solve the problem I got the RSP8. Those were installed today. While I had the other router down for the RSP upgrade, this router ran at almost line speed on the OC3 circuit without any errors that I'd consider abnormal for that kind of load.

My current theory. Since I'm using dCEF on these routers, each VIP has to store the FIB. The FIB of two full route views is too much for the VIP2-50 to handle in addition to it's packet processing duties. High memory utilization in the VIP occasionally creates a situation where the receiver hardware is unable to hand received data to a hardware buffer.

Does my theory hold water? I'm am really new to this and don't have the benefit of past experience to draw on. I really hope that somebody has some insight into this.

Reply to
nunyo.dambidness
Loading thread data ...

Unless you are using BGP max-paths = 2, then the RIB will only contain best for each BGP prefix. The RP FIB is poulated from the RIB. The line card FIB is poulated from RP FIB.

  1. Determine the size of the RP FIB table with the "show ip cef summary" command.

  1. Determine whether the VIP has sufficient available DRAM to store the FIB table.

Issue the show controller vip [slot#] tech command, and check the output of the show memory summary command.

Reply to
Merv

Thanks for the reply. Would BGP max-paths be something that shows up in the output of show running-config? I'd imagine that it would only if it were set to a non-default value. at any rate I don't see it. Please excuse my ignorance, I've inherited this from someone else and I'm admittedly in over my head a bit.

Based on the output below it looks to be about 35MB.

IP Distributed CEF with switching (Table Version 649929), flags=0x0 178865 routes, 0 reresolve, 0 unresolved (0 old, 0 new), peak 41044 178868 leaves, 10480 nodes, 35233480 bytes, 649866 inserts, 470998 invalidations 51 load sharing elements, 17136 bytes, 51 references universal per-destination load sharing algorithm, id 047B1A38 2(0) CEF resets, 64 revisions of existing leaves Resolution Timer: Exponential (currently 1s, peak 4s) 51 in-place/0 aborted modifications refcounts: 3030442 leaf, 2683136 node

Table epoch: 0 (178868 entries at this epoch)

Adjacency Table has 8 adjacencies

It looks like the largest contiguous block of available memory is greater in size that the FIB, meaning that my theory doesn't stand by itself. Any other thoughts or ideas? I do appreciate the help.

------------------ show memory statistics ------------------

Head Total(b) Used(b) Free(b) Lowest(b) Largest(b) Processor 60C75C80 121152384 70530436 50621948 48536356

48135456 PCI 30000000 8388608 8388528 80 80 48
Reply to
nunyo.dambidness

Your RP FIB table size seems correct based on having full BGP routes.

For the BGP max paths question, pick a prefix that you know you receive from both upstream providers

! what is in the BGP RIB (routing information base) for a specific prefix sh bgp x.x.x.x/length

! see how many paths for that route have been installed in ! RIB (Routing Information Base) -! - main routing table. ! If only one path, then maximum paths is not configured. sh ip route x.x.x.x/length

Do you have a support contract with Cisco ?

Reply to
Merv

BTW apparently you can log into the VIP

by using "if-con "

and do a "show memory"

Reply to
Merv

I do have a support contract with Cisco and it seems that I'm only keeping one route per prefix as you suspected.

Reply to
nunyo.dambidness

If you have a Cisco support contract, I would open a case with the TAC.

BTW you may want to seach this group for past posting on the VIP2-50 and the OC3 port adapter you have. Looks like the VIP2-50 has had quite a few problems in the past.

Reply to
Merv

you seem to have enough memory on the line cards, but there are sometimes possibility for FIBs inconsistency. Check also 'sh cef linecard' to see if all cards are sync'd and have the same table version. Though in your case it's likely to be ok.

Kind regards, iLya

Reply to
Charlie Root

Thanks for the reply. As you thought, cef is sync'ed across all linecards.

I did stumble into an interesting piece of information as a result Merv's post about if-con . The output of sh vip statistics is listed below.

VIP-Slot9#sh vip stat VIP2 Network IO Interrupt Throttling: throttle count=14063632, timer count=2857195 active=0, configured=1 netint usec=4000, netint mask usec=200 VIP Cbus error statistics: bus_stall=6, bus_stall_read_events=0, bus_stall_write_events=0

That last line, bus_stall=6 concerns me. To my thinking, it seems reasonable that the VIP would be unable to send packets to MEMD during an event called a bus_stall (I still haven't been able to find documentation that indicates what this is exactly). If the VIP is unable to move packets to MEMD, then it follows that it can only RX side buffer so many packets before "the input rate of traffic exceeded the receiver's ability to handle the data", at which point I get an overrun error. So that is my new theory. I'm still awaiting return communication from Cisco. Can anyone help me tear down or prove the new theory?

Reply to
nunyo.dambidness

This sounds plausable. On the other hand what's essential is bus stall should get attention on its own. I haven't seen this errors myself, but here is some info from CCO:

------ %C6KERRDETECT-4-SWBUSSTALL : The switching bus is experiencing stall for [dec] seconds Explanation The switching bus is stalled and data traffic is stopped. This condition can indicate that a line card is not properly seated or that card hardware has failed on the switching bus. Recommended Action If this message recurs, copy the message exactly as it appears on the console or in the system log. Research and attempt to resolve the issue using the tools and utilities provided at

formatting link
With some messages, these tools and utilities will supply clarifying information. Search for resolved software issues using the Bug Toolkit at
formatting link
If you still require assistance, open a case with the Technical Assistance Center via the Internet at
formatting link
or contact your Cisco technical support representative and provide the representative with the information you have gathered. Attach the following information to your case in nonzipped, plain-text (.txt) format: the output of the show logging and show tech-support commands and your pertinent troubleshooting logs.

------

See if you have such messages in the logs. Looks like if you getting these errors regularly and re-inserting the linecard doesn't help, then RMA is the only solution.

Kind regards, iLya

Reply to
Charlie Root

does this command work on your router ?

VIP-Slot0# show vip hqf

!--- Output suppressed.

qsize 1525 txcount 46810 drops 0 qdrops 0 nobuffers 0 aggregate limit 2628 individual limit 657 availbuffers 2628 weight 1 perc 0 ready 1 shape_ready 1 wfq_clitype 0

Reply to
Merv

do you have "service salve-log" configured so you get important VIP error messages ?

Reply to
Merv

VIP-Slot9#sh vip ? PCI Show recorded spurious PCI errors accumulators Show VIP MEMD accumulators & Rx buffering stats command-queue Show VIP Malu Attention command queue dma-engine Show VIP DMA engine drq Show VIP DMA receive queue dtq Show VIP DMA transmit queue ecc Show recorded single bit ECC errors full-qos Show VIP HQF statistics and config hqf Show queueing hierarchy memd Show VIP MEMD buffers/queues nbar dNBAR related show CLI packet-memory-drops Show # of VIP drops due to insufficient packet memory pas-txrings Show registered driver TxRing information qos Show VIP HQF statistics statistics Show VIP network level statistics tx-polling-high Show/set tx-polling-high on one ifc wred-monitor WRED performance monitoring

VIP-Slot9#sh vip hqf

VIP-Slot9#

I have the command, but it doesn't seem to return any output

I have since configured service slave-log, I'll have a look at syslog this afternoon. I'll be looking for the bus stall errors for sure. Thanks guys, I feel like I may finally be onto something worth chasing down now.

Reply to
nunyo.dambidness

Well so much for that. I checked the other line cards. They all report the same with the exception of the GEIP in slot 10. This card has not been coming up after the router is power cycled, and I've had to re-insert it with the router running. It reports bus_stall=0. I'm inclined to think that the counter was incremented on the other line cards when this GEIP was reinserted.

Reply to
nunyo.dambidness

Which other cards do you have in this router and in which slots are they inserted? 7513 has two buses of 1Gbps each. Could it be that linecards are placed suboptimally considering your traffic patterns (i.e. from which interface to which interface)?

Kind regards, iLya

Reply to
Charlie Root

In this router, I have a POS card, a GEIP+, and two GEIP's. The POS card is my WAN port. One GEIP connects me to another 7513. Traffic on that link is limited to iBGP and any traffic that arrives at one router but the preferred route is out the other. The GEIP+ connects me to a core layer 6509 and the remaining GEIP connects to another core layer

6509. In the router that I'm working on, all cards are installed on the 2nd bus and CBUS utilization never exceeds 40% with normal utilization around 27%.
Reply to
nunyo.dambidness

There is a field notice for GEIP bootup problems.

see

formatting link

Reply to
Merv

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.