I don't think it was a bad crossover cable. Got the cable tester on it and it report all fine. I suspect it was probably a mis-match between NIC and switch speed/duplex settings. Switch was set to 100Mb/Full, NICs auto set themselves at 100Mb/Half.
Strapping the NICs to 100Mb/Full killed the connection. I suspect because the NIC could handle having the wrong cable at Half Duplex, but not at Full Duplex.
Some old boilerplate I trot-out from time to time:
How 100Base-T Autoneg is supposed to work:
When both sides of the link are set to autoneg, they will "negotiate" the duplex setting and select full duplex if both sides can do full-duplex.
If one side is hardcoded and not using autoneg, the autoneg process will "fail" and the side trying to autoneg is required by spec to use half-duplex mode.
If one side is using half-duplex, and the other is using full-duplex, sorrow and woe is the usual result.
So, the following table shows what will happen given various settings on each side:
Auto Half Full
Auto Happiness Lucky Sorrow
Half Lucky Happiness Sorrow
Full Sorrow Sorrow Happiness
Happiness means that there is a good shot of everything going well. Lucky means that things will likely go well, but not because you did anything correctly :) Sorrow means that there _will_ be a duplex mis-match.
When there is a duplex mismatch, on the side running half-duplex you will see various errors and probably a number of _LATE_ collisions ("normal collisions don't count here). On the side running full-duplex you will see things like FCS errors. Note that those errors are not necessarily conclusive, they are simply indicators.
Further, it is important to keep in mind that a "clean" ping (or the like - eg "linkloop" or default netperf TCP_RR) test result is inconclusive here - a duplex mismatch causes lost traffic _only_ when both sides of the link try to speak at the same time. A typical ping test, being synchronous, one at a time request/response, never tries to have both sides talking at the same time.
Finally, when/if you migrate to 1000Base-T, everything has to be set to auto-neg anyway.
I've seen many a case of NetWare reporting a 'negotiating' NIC as operating in FD mode and we see the Cisco switchport having negotiated FD but still showing collisions. As a general rule we've now hard coded all core switch ports (200 servers connected) for the last two years. Once the 'general rule of thumb' becomes policy it's easy to manage.
It's followed over to our migration to Gbit and we've never looked back. It's not so much that we think autonegotiation is totally unreliable but hard coding both ends significantly reduces the possibility of problems.
Having the option to hard code Gbit switchports (which generally means you can also hard code them down) allowed us to replace all 10/100 cards with
10/100/1000 and progressively migrate non-Gbit devices to Gbit.
Your Cisco switchport claims to be running full-duplex mode, but is also logging colisions?
This is very surprising behavior. Did you log a bug with Cisco? What did they say?
Okay, but you might be introducing a problem. It's very easy to force all of your switch ports, especially since they're likely all under the conrol of one group, and the ports live on a small subset of equipment (the switches)... You didn't say that you positively know the operating mode of every last host transceiver in your environment.
Perhaps in your experience it reduces the possibility of problems. I explained in the OP that in my experience it greatly increases the possibility of problems. I'm guessing from Rick's post that he's seen the same thing:
rj> c) people (network administrators among them) who didn't fully rj> understand how autoneg was supposed to work and ass-u-me-d that rj> they could leave one side at auto and hardcode the other to rj> "force" the mode they wanted.
In my work, I regularly deploy equipment in other people's datacenters. I can't tell you how many times I've setup systems where the customer's switchports were running auto (often the switch had just come out of the carton), only to be called back a few months later to discover that some admin had blindly forced all port operating modes without knowing/caring what was on the other end of each cable.
When I deployed the equipment, I checked the switchports and setup a good match. When the admin changed the port settings without coordination he broke things. He was likely applying a similar "rule of thumb".
I don't understand what you're trying to say. You can plug 10/100 cards into 10/100/1000 ports (and vice versa) regardless of your ability to administratively restrict operating modes. It seems like you're just creating work for yourself. "Having the option to hard code" certainly did not "allow you to replace".
And you missed the point of my original post: Autonegotiation is mandatory with gigabit. The silent force mode Cisco employs on its FastEthernet switchports cannot work in a gigabit environment.
The option you describe doesn't exist. You're still autonegotating, just perhaps with a limited subset of operational modes. This is very different from the FastEthernet model where autonegotiation can be truly eliminated.
That's completely normal behaviour when there is a duplex mismatch. The switchport was operating in FD but collisions were ocurring on the Ethernet ... hence we discovered the other end was not operating in FD although the OS said it was.
We only hard code 'server' switchports. Interesting how people take things out of context. I specifically mentioned 'core' switch ports and never stated 'every single host'.
Yes, and I'm presenting my experience which indicates migrating to Gbit doesn't mean 'everything has to be set to autonegotiation' and you can the hard coding of speed and duplex settings can be managed. It's horses for courses as they say.
Communication during the initial setup? Wouldn't you discuss the state of switchports with the sites support person/s? Were you made aware of the rules? Did you ask? Were the details of your installation documented?
I'm not having a go at you I'm just conveying the fact that mismatches can be avoided when all parties 'communicate'.
Sorry for missing the 'point' of your post but I was making comments related to my experience with Gbit and 'negotiation'.
I don't understand what you saying. our server ports do not autosense. They are hard coded at both ends. How can autonegotiation be 'mandatory' when you can still hard code the NIC/switchport. What am I not understanding?
If your switch is operating in full-duplex mode, how can it possibly detect collisions? A collision (on twisted pair or fiber media) in half-duplex mode is defined as frame-reception-while-transmitting. However, in full-duplex mode, frame reception while transmitting is perfectly normal and expected. The question is, what "event" is being counted by the collision counter for a switch port in full-duplex mode?
Now, the collision counter on the *half-duplex* end of a mismatched pair may very well see collisions; in fact, it may see more than usual, since its full-duplex link partner considers it perfectly acceptable to transmit while receiving frames.
You cannot disable Auto-Negotiation for 1000BASE-T. What you *can* do is limit the Auto-Negotiation advertisements to reflect a desire to operate as "gigabit or nothing," but the device will still execute the Auto-Negotiation algorithm. At minimum, it needs to select a clock master for the link, and place the other device in slave status.
While I have heard numerous anecdotal accounts of Auto-Negotiation failures (in this newsgroup, and similar fora), I have never personally seen or experienced a single instance of such on any production (non-prototype) equipment.
If you're dying to see it for yourself, find a switch with a late-90s-era Broadcom PHY (e.g. Cisco 5000) and a PC with a 3com NIC using _original_ Windows drivers. It'll fail every time.
Those of us who dealt with these things in the wild saw it all the time back when it first came out. Particularly problematic is that folks tend to buy the same server/PC models in bulk, so if one fails, odds are you're looking at several hundred/thousand simultaneous failures. Or you might luck out and never see any, depending on what IT was buying.
It's definitely not as bad now as it was then, but the problem isn't gone yet. However, I'd agree that the rate is likely much lower than admins believe it is.