How is a break/reconfiguration propagated in an RSTP network?

I'm seeing reeeeeeeeeeally long recover times when I break a link in a simple 4-switch ring managed by RSTP and I don't know what's not working. I've got 802.1D-2004 open on my desk but it's, shall we say, opaque. I hope someone here understands it enough to give me a pointer.

I've got a ring like this:

+-------+ +-------+ | Sw1 | --------- | Sw2 | +-------+ +-------+ | | | | +-------+ +-------+ | Sw3 | --------- | Sw4 | +-------+ +-------+ | | | | +-------+ +-------+ | PC1 | | PC2 | +-------+ +-------+

Sw1 is the root. PC2 constantly pings PC1. With everything connected, the link Sw3/Sw4 is blocking and the normal path of these pings is PC2

-> Sw4 -> Sw2 -> Sw1 -> Sw3 -> PC1 (and back, of course).

If I break the link between Sw1 and Sw2, I lose connectivity (drop pings) for nearly 2x the HelloTime of the network (3-6s for Hello = 2s,

1.7-2.3s for HelloTime = 1s).

When I break the link shouldn't the port state transition on Sw1 and Sw2 cause them to tell Sw3 and Sw4, respectively, that "I'm not in the path you want anymore?" Maybe Sw1 would think, "I'm still root, that's cool." but shoudn't Sw2 tell Sw4, "I lost my path to root" _immediately_ on loss of link on the port leading to Sw1? What I see if Sw4 continuing to forward pings to Sw2 long after the Sw1/Sw2 link is broken.

TIA.

Reply to
Christopher Nelson
Loading thread data ...

formatting link
Has an explanation that anyone can understand.

formatting link
be more difficult however it should be more rewarding too.

I have read much of 802.1d and found it difficult. The purpose of these documents is to create an unambiguous specification and not necessrily to be a tutorial.

Reply to
anybody43

formatting link

That seems to lead to an introdution with no document following. If there's a link to get to the next section, I can't find it. I thought at first it was my creaky old version of Mozilla but Firefox shows the same thing.

While some of the explanations there were clearer than others I'd seen, it didn't tell me much I didn't know.

My biggest remaining question is what's "less time" or "much faster". The paper says that STP took 30 seconds to a minute or more to recover and that RSTP is faster. I'm seeing 2-3 second recovery when the primary path is broken. Is that expected and acceptable? When I read the description in the document refereneced above, it seems that the network should be able to converge in 10s of milliseconds but I'm not seeing that.

Chris

Reply to
Christopher Nelson

The 10s of msec for recovery is not generally true. It happens only for certain failures that are "easy" to repair based on network topology

-- cases where it is obvious to the bridge experiencing failure that the root has not failed and that the probability of causing a loop by opening one of its blocked ports is very small. Even for these cases the repair time is usually ~100 msec. Further, the actual repair time also depends on the quality of the implementation and the way a port failure is communicated to the software running STP on the switch.

For the kind of failure you have 3-6s sounds reasonable. I suspect you would do a bit better if you had sw1 directly connected to sw4 and repeated the test, but it still won't be on the order of 10s of msec.

Anoop

Reply to
anoop

Thank you for the reassurance but while I accept your answer as being well-informed and likely to reflect reality, I don't get why the latter is true. If Sw2 detects loss of link on it's connection to Sw1, the root. my incomplete understanding of RSTP suggests that the following will happen in fairly short order:

- In response to the dropped link, Sw2 will immediately tell Sw4, "I'm think I'm root"

- In response to Sw2's claim to be root, Sw4 will immediately tell Sw2, "Nope, Sw1 is still reachable via Sw3. You can get there through me."

- At essentially the same time, Sw4 will make the port leading to Sw3 it's root port.

- Upon receipt of the notification from Sw4, Sw2 will make it's connection to Sw2 a root port and no longer believe it is root.

Where are there 3 seconds worth of processing in there? There are no timers involved, are there? Isn't that the point of RSTP? And why do I see that the recovery time changes when I change Hello Time?

Sorry if I'm dense but this is making me crazy.

Reply to
Christopher Nelson

I've dug further with Ethereal and what I see seems really wrong but I'm far from having RSTP be intuitive so I'm hoping for some feedback.

At steady state with no connections being broken or established, Sw2 is sending Sw4 BPDUs like this:

Flags: Agreement, Forwarding, Learning, Port Role: Designated, Proposal Root ID: Sw1 Root path cost: 200000 Bridge ID: Sw2 Message Age: 1

If I then break the connection between Sw1 and Sw2, Sw2 sends Sw4:

Flags: Agreement, Forwarding, Learning, Port Role: Designated Root ID: Sw2 Root path cost: 0 Bridge ID: Sw2 Message Age: 0

What doesn't make sense to me is that once things have settled out in steady state, I'd expect Proposal to be _cleared_ and when Sw2 tries to assume the root role, I'd expect it to be _set_.

Is there something I don't understand or is the state of Proposal backwards in these traces?

Chris

Reply to
Christopher Nelson

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.