Occasional high RTT through 2950T

In the process of debugging some seemingly random connectivity errors, I noticed the fast ethernet ports on the 2950 seemd to simply stop forwarding frames now and then. Replacing the switch removed the problem from the customer network, but I'm still curious if I can figure out what might be wrong with the switch. The network consists of only one switch.

Pulling it into my lab and tinkering a bit, I noticed weird patterns in the RTT when leaving it running overnight (second column is the time since last RTT > 2ms):

00:51:23 02:27:55 +2:36 02:49:02 +0:22 03:10:09 +0:19 03:31:15 +0:21 04:40:54 +1:09 06:02:04 +1:38 06:56:23 +0:54 08:11:47 +1:16 08:54:01 +0:43 09:12:20 +0:18 09:37:38 +0:25 09:48:18 +0:11 11:24:51 +1:36 13:01:22 +1:37 13:34:34 +0:33 13:55:42 +0:21 14:16:48 +0:21 15:11:06 +0:55 15:32:12 +0:21 16:26:31 +0:56 16:47:37 +0:21 18:03:02 +1:16

As you can se, it's not very random. There's a high frequency of ~20m intervals, and it's also a noticable pattern at ~55m and

1h37m.

I saw no debug information that seemd relevant, and posting it here would be spamming a lot...

Suggestions? The OS is c2950-i6q4l2-mz.121-22.EA5a.bin. I found spanning-tree running, but all ports are configured with the portfast feature (so I switched it off).

Reply to
Def
Loading thread data ...

What are you actually pinging? The switches IP address or a connected host?

I'd hardly call 3 out of 22 samples (~55m) or 2 out of 22 (1h37m) "noticable patterns".

Not sure what you were implying with the comment "I found spanning-tree running, but all ports are configured with the portfast feature". Do consider that unusual or "bad"?

Portfast for access ports causes the port to bypass the "listening" and "learning" stages of spanning-tree and imediately start forwarding traffic. Unless there's another switch attached and a 'loop' formed that (forwarding traffic) is a good thing for hosts.

Are there any port errors? Are any ports geting 'disabled'? Are there duplex errors? Is there a spanning-tree instability, causing periodic reconvergence?

BernieM

Reply to
BernieM

I am pinging a connected host.

The samples are the odd packets in a total of some ~62k packets, and the intervals were similar enough for me to call it a pattern, but you're free to call it what you want of course :)

What I implied with the spanning-tree comment was that spanning-tree was running in a network with only one switch in total. The network exists only between a few hosts, and there are no users or network admins involved other than me, so I can say for sure than spanning-tree is not needed. The configuration I found (when taking over the responsibility) had the described portfast settings.

There are no port errors, no packets other than what the counter calls just plain "packets", and no err-disabled. Duplex and speed are set in both switch and hosts, and the cables are short and tested ok. All patch cables go directly to the hosts, there's no patch panel, and no cable runs I cannot see. Spanning-tree instability was what I suspected at first, but wouldnt 'debug all' yield something about this?

My primary suspect right now is the hardware, but I'm generally not all that impressed with the quality of 29xx-series hardware so I might be inclined to draw that conclusion too fast... The POST is ok, but I might do some more interesting diagnostics in ROMMON?

Reply to
Def

Hi,

20 pings > 2ms out of 62,000 !!!!!!! You are supposed to post these on 1 April. Why didn't you save it for next year?

My first assumption would be that the ping target or receiver was the cause of the variation.

To get meaningful results directly from tests such as this you would need to get a true hardware based tester such as a Smartbits.

The one place where the switch might be falling down could be the mac address learning process. Read about 802.1d bridges and the learning process and the implementation options and tradeoffs.

What result do you get when you ping 127.0.0.1?

I have been doing this for > 10 years and your result is not likely to get me out of my chair.

If I was building a real time control system where these times relly were critical I would of course be interested however on normal data networks it is completely irrelevant.

20 * 2ms = 40ms of "lost" time.

17 hours = 17 * 3600 or about 60000 seconds

Lost time as a proportion of elapsed time

0.04 / 60,000 or about 1/1,500,000.

Consequences of this are that Microsoft word takes on average an extra 30 microseconds to open.

Reply to
anybody43

Sorry I took the sample output as the entire run ... bit slow today. Yes, 'debug spanning-tree all' would have shown you something. Pinging a directed host? Just a thought but what's to say fluctuations in the response times isn't due to variations in the hosts own processing load? Have you got multiple pings going that show a delay across the entire switch? You said the switch stops forwarding frames but is there actual packet loss? Have you checked Cisco for bug alerts for the IOS version?

Sorry I don't have have anything more positive suggestions.

BernieM

Reply to
BernieM

I just noticed that other post and they've made a good point ... 2 ms delay is 'nothing'. That doesn't equate to 'stops forwarding traffic'.

BernieM

Reply to
BernieM

Ok, but apparently there is something about this switch that causes the hosts (or application at least) communicating through it to loose connectivity. Beeing a critical backend for something that Must Work, it is limited how much testing I can do while in the production environment.

The ping target and reciever were dedicated and stripped unix hosts and I doubt they would introduce the delay... but what do I know, I've only been doing this for 10 years :)

I would love to blame one of the hosts (of which one a special device with limited configuration options) or the application (which is proprietary, not very well programmed and seemingly very sensitive to network delays. I haven't heard anyone call it a real time system, and from what I know it isnt anywhere near, but some times I wonder...), but when replacing the switch while keeping the config and software, it Just Works, which tells me the switch was acting up. The question is how I can figure out _what_.

I'll look more into the learning process for .1d and how the mac address table is maintained, thanks.

Reply to
Def

Interesting problem. From reading the thread, I wonder if it was not a physical layer issue that was temporarily fixed by the replacing of the switch. From the result of your RTT test, it appears the switch itself is fine, unless it can be tested with a real stress tester, or in a simulated environment. I have seen connectors cause similar problems to what you are describing above.

>
Reply to
Dana

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.