Company network slowdown

D

DanR 20 years ago

Question about typical company network. We are looking at going gigabit mainly because of a perceived network slowdown in the past 6 months or so. But... some of use are not sure that the 100 Mb T1 current network is really the fault. Question is: We have some really speedy computers on the network and some not so speedy. Can slow clock speed computers drag down the entire network? We have B / G Wi-Fi on both sides of the firewall. Can they drag down overall speed of the network? We have hubs / switches that feed other hubs / switches. How bad a practice is that? There are about 50 wired drops around the building and around 8 wi-fi hot spots. Previous IT guy set the wi-fi up with all different SSIDs. We don't care about lap top roaming so maybe that's not a big deal. Or not? Any suggestions?

Vote

J

Jeff Liebermann 20 years ago

Is "typical" a good reason not to itemize any of the hardware or operating systems involved?

Gigabit is great for taking the load off servers. For example, if someone is doing regular backups or huge file transfers, running that traffic through a single 100baseTX port on a server will cause traffic constipation at the server. You would probably be better off installing a 2nd ethernet card in the server, but gigabit will help.

However, once the traffic hits the ethernet switch, the only place it goes is to the destination machine. Other users, using other ports, such as to/from your T1 internet connection, will not be affected by the heavy traffic in the slightest. Therefore, based on your limited description of the topology, I doubt that gigabit is going to do anything useful.

A T1 (DS1) is 1.544Mbits/sec. You'll get about 1.3Mbits/sec thruput in both directions. Have you benchmarked this connection? I suggest:

formatting link

may disclose some setup and buffer issues. The CSU/DSU for the T1 probably has a 10/100Mbit/sec ethernet port. No sense in making that gigabit as you only have 1.5Mbits/sec to move through it. The T1 speed is the limiting factor.

Please note that a T1 is no better than a common DSL connection except that it has far superior outgoing bandwidth. A 3 or 6Mbit/sec DSL line, or 6Mbit/sec cable modem, will outperform a T1 for incoming connections. If the T1 is clogged with junk, then perhaps some QoS will suffice to delay a bandwidth upgrade.

With switches instead of hubs on a wired network, generally no. I can create some kind of science fiction situation where a slow machine will cause problems, but the ability of the switch to isolate traffic generally prevents interaction. However, if there is a common bottleneck for all the machines, such as the T1, then there will certainly be problems.

Generally no, but it's possible. What wireless does is create common network (air) path for all the wireless users. You no longer have the benefits of separate switched paths as in a wired ethernet switch. Only one radio may transmit in a given air space. The result is consider mutual interaction and interference among wireless users.

It sucks. See the 5-4-3 rule for hubs.

formatting link

that a hub is a repeater and that many texts use the terms interchangeably. Basically, it says not to put more than 3 hubs in series. I've had so much trouble with spaghetti LAN's using hubs that I replace them with switches as soon as I find them. That includes

10/100 hubs which are actually worse than single speed hubs.

Ideal is a central stackable and SNMP managed switch in a star topology. That never happens as "workgroups" tend to add switches where clusters of ethernet devices come together. As long as they use switches, I don't have much of a problem. I make sure that the collision domains do not become excessive and track the end to end wire lengths. Dig out your drafting pad or Visio network topology scribbler, and make a drawing of your network. It's impossible to troubleshoot network constipation problems without a road map.

That's not a huge system. However, there are plenty of places where things can break.

Leave it alone. The only thing a common SSID gives you is the ability to roam around. Having different SSID's gives the user the ability to chose which access point they want to use. Using a common SSID leaves it to the flaky driver software, which never seems to get it right.

Nope. Do you expect a mechanic to fix your car without telling him the make and model? Do you go to a doctor and not expound on where it hurts or how much? So, you get only general advice and sympathy.

Get some bandwidth and traffic monitoring going. Your CSU/DSU and router probably support SNMP. I suggest MRTG or RRDTool.
formatting link

formatting link
can easily tell if your T1 is constipated. If so, then optimize, add QoS, or add more bandwidth. You may have bottlenecks or high error rates elsewhere.
Replace the hubs and dual speed hubs with switches. Don't bother with gigabit unless you're bottlenecked at the server(s).
Do some sniffing and see what *TYPE* of traffic is causing problems. I suggest Ethereal:
formatting link
is tricky with a switch so be prepared to do some hardware juggling or managed switch configuration for a monitor port. Be prepared to "discover" virus, worm, and streaming traffic. One Bittorrent filesharing user will bring your network to a stop.
Draw a network map so you can ask for help. This is not a trivial exercise. It usually takes me about a week to do properly on a large and complex systems. Just finding all the devices, servers, and bootleg attachments are a major challenge. That includes noting MAC and IP addresses for identification.
Get help from someone experienced in network analysis and troubleshooting.

Vote

A

atec 20 years ago

Have you run a sniffer over the network to determine where the consumption and waste is ?

Vote

B

Bigguy 20 years ago

You must run a network traffic analysis prog to see where the bottlenecks are and how the bandwidth is being used/shared.

Consider putting high bandwidth 'power' users on their own network if possible... give them a fibre spine if required.

Someone should be managing your network - reliabilty, usability and security will be compromised if you let benign (?) anarchy rule ;-)

Have fun

Guy

Vote

J

Jeff Liebermann 20 years ago

Fine. However you should have some clue who's got performance problems.

That's Suite's, not sweets.

I can't tell for sure but if you have 50 boxes, you really should get someone qualified to do the troubleshooting. It's easy enough to plan and setup a new network. It's requires experience to troubleshoot an existing network.

Well, ok.

Ok, so it's an *INTERNET* slowdown, not a server to client or render farm slowdown. That's not going to change at all by going to gigabit. You're bottlenecked at 1.5Mbits/sec at the T1 and that's your limit. Do the traffic monitoring to see what and how much is moving in and out of the T1. Don't be surprised if you see worms, file sharing, and garbage.

That's very different from an *INTERNET* slowdown. Most render farms are interconnected with gigabit ethernet. The big boxes have multiple gigabit cards to distribute the load. I got to play with one RAID server with 4 cards and a load balancer. Yeah, for in house traffic, gigabit is great.

However, you still have to know if you're making an improvement. For that you need numbers, measurements, calculations, and pretty graphs to impress the boss. I suggest MRTG for traffic monitoring.

Baloney. CAT5e will do gigabit just fine. You don't really need CAT6. Keep the cable lengths down to less than 300ft. Avoid long flexible ethernet CAT5 jumpers. Borrow a cable certifier and test your wiring. New gigabit NIC's are cheap. Netgear GA311 is about $20. I recently upgraded a law office with gigabit everything. It was a barely noticeable improvement. You only notice an improvement if your existing 100baseTX system is saturated. Do the measurements and you'll know for sure. If lazy, use Windoze XP Perfmon to check client network utilization.

Fine. Draw the topology map as I suggested and see how many boxes in between the gigabit NIC's need to be upgraded.

Home runs to what? I smell a big building with cable lengths more than 300ft which will require some intermediate boxes. Home runs aren't always best.

How long? If you don't know, guess.

Well, ok. I think I've given you a good start on the buzzwords. So far, you've made the decision to spend some money, considerable time, and a bit of guesswork, in order to upgrade a network that you don't have a clue where it's running slow, why it's running slow, or whether you have a traffic problem. Also, this has nothing to do with wireless so you're asking in the wrong newsgroup. To insure that you'll get no useful answers, you've supplied not one single name, number, model number, distance, or accurate description.

Well, you're learning. Business LAN's are very similar except that reliability is a much bigger issue than performance or features. Your real task will be to fix whatever problem you can't seem to describe accurately, and do it without breaking anything else or having 50 irate graphic artists screaming at you. That's quite different from home networking.

No. The bandwidth is distributed roughly equally among the workstations.

Yes. In theory, each workstation will get 1/10th the incoming bandwidth. MS Update is a bad example because of the way they do bandwidth limiting, but that's a diversion and not part of this discussion.

Yes.

No. I do that in the office. Screaming audio is from 24Kbits/sec to about 128Kbits/sec. Compared to your 1500Kbit/sec, the screaming audio listener only eats about 8% of your incoming bandwidth. However, if you're saturating the T1 with other traffic (do the sniffing), then that last 8% might be fatal.

Vote

J

Jeff Liebermann 20 years ago

Oops. I just mean't the GA311 as an example of a cheap gigabit NIC. I have to confess that I don't have experience with the GA311 NIC under heavy continuous load. I guess I'll avoid the GA311 as the Intel card is only about $30 each. |

formatting link

only point was that a gigabit conversion is no longer very expensive at the client end.

Looking at Gigabit switches, the prices seem to hover around $10-$20 per port for unmanaged and $25 to $40 per port for managed switches. I would go with the managed switch as I'm a big fan of SNMP monitoring and management. Knowing what's happening and being able to turn things on and off remotely is worth the extra dollars. |

formatting link

|c:201|c:596|94 gigabit switches to chose from, some of which are fairly cheap.

Incidentally, you're largely proving my point, that gigabit is only effective when the network segment is heavily loaded. With light loads, I can do quite well with 100baseTX-FDX.

Vote

D

DanR 20 years ago

Jeff Liebermann wrote:

Yes, I should have provided more information about our network hardware. Problem is I don't really know. We are a production company with 6 Avid sweets, 2 audio sweets, one online editing room and an interactive department. We don't have any IT people per se... but have designated one of our coders to be responsible for the network. He's a sharp guy and seems to know his network jargon. And he is new on the job having taken over the network from someone who left. Because I'm fairly handy with computers in general I'm helping the boss think through our move to giga-bit and the coincidental network / Internet slowdown we have been experiencing. The main reason to go giga-bit is to move very large files around on the network. (video files in the giga-Bytes) And because of the Internet slowdown of late we are talking and wondering if that will improve Internet throughput. Obviously it will be a fairly expensive endeavor to run all new cable throughout the building and get new NICs. So we're also thinking about only doing new giga-drops at some work stations and not the entire network. All new drops will be home runs and if we do the entire building that means all home runs. But there's a but and that is that we are considering fiber to the upper floor because of long runs. So that is a bit of background and I'm just trying to learn what I can so I can ask intelligent questions and better understand what the heck is going on. I'm basically a home network guy and that is the extent of my network hardware knowledge. I appreciate the help so far provided. Thank you all. Jeff... when you say "A T1 (DS1) is 1.544Mbits/sec. You'll get about

1.3Mbits/sec thruput in both directions." Does that mean that just one workstation at a time will see that throughput? If 10 computers / workstations are at the same time doing a Microsoft update for example... are they sharing that 1.3Mbit bandwidth? Are they each then downloading at 130Kb. Does it work that way? Also curious about one of our people who constantly listens to Internet radio streams. Any harm there?

Vote

D

DanR 20 years ago

What are the symptoms of a bad or "garbaging" NIC? Would it be constant traffic even when the user is not doing anything network related? Would "watching the "blinking lights" help find one of these NICs? Would a managed switch make a "garbaging" NIC a non issue?

Vote

P

Pierre 20 years ago

If you are running from the server through one switch and using one output to feed another switch at 100 Mb, then taking the outputs of the second switch to feed a number of workstations, then all those workstations must share the single 100Mb feed from the first switch. Not good practice for maintaining good throughput and response.

Just watching the "blinking lights" on the switches can give you some idea of loading and in what directions the load is coming from.

Either you need to redistribute the workstation load more evenly or better, take the network to gigabit so that the data moves a bit faster. Also be on the lookout for a bad or "garbaging" NIC. Some varieties can soft fail slowly and really start dragging a network down. Using managed switches rather than unmanaged and setting them up properly usually makes a significant difference.

You may also wish to look at adding a second (and third or fourth) ethernet port on your server and feeding a switch directly rather than using a point of an existing earlier switch. Four ethernet ports on the server, each feeding a single 16 port switch and then directly to the clients will share out the load significantly but be absolutely sure you use good NICs such as the genuine Intel Pro series rather than many of the cheap aftermarket types that generally cannot stand very high consistent traffic error free.

Remember also the cascading guidelines for switches, 10Mb - 3 cascaded,

100Mb - 2 cascaded, gigabit - no cascading.

Peter

Vote

P

Pierre 20 years ago

Hi Dan,

A garbaging NIC can often be found by watching the lights. Network software analysis tools very rarely find it as the data it is sending is invariably a load of rubbish and may not even be valid bytes. All it seems to do is use bandwidth. The user may even be otherwise totally inactive but the NIC keeps chattering. A final usual proof is to unplug the ethernet cable at the suspect machine and see if there is an improvement.

Putting in a managed switch is not the way to fix that problem. You have to find the bad NIC and replace it. It is a bit like using a bucket to drain a flooded area when in fact the drain should be unblocked!

As others have said, a good audit and mapping of the complete network is mandatory if you are going to approach the issues in any sort of logical manner. The scatter gun approach generally leads to more confusion.

With a good map of your network, you can isolate sections logically and see if the isolated section was that hogging the network and then break that section into smaller sections until the culprit is found. There could well be other issues which have affected the network loading and performance too such as a new application installed, the server databases not responding quickly enough because of server performance issues and so on. Again. draw up in detail what the network has and step through it first.

Vote

P

Pierre 20 years ago

Jeff has it right again except for one part. Gigabit NICs are cheap and you get what you pay for. having been intimately associated with a similar type of installation, we ended up throwing out 23 Netgear GA311 NICs and a variety of other breeds. The majority of them just cannot reliably stand intense high volume traffic as occasioned by hundred megabyte file transfers running 24/7. They randomly and intermittently buckle resulting in a few more retries which takes precious bandwidth. Commercial installations usually run at sub 5 or 10% network utilisation. Graphics and imaging sites often run at 80%+ utilisation for minutes on end.

After a lot of experimentati>

Vote

E

ES 20 years ago

First of all i suggest updating drivers on all of your network card. The I suggest removing hubs and replacing them with switches Then run a traffic analyzer on the hosts (pc) where you see more traffic.

Vote

J

Jeff Liebermann 20 years ago

Yes, the active workstations share the bandwidth roughly equally. Note that this is NOT true with wireless where the distribution varies with the connection speed.

No, this does NOT mean that one workstation at a time will that thruput as you previously stated.

You get what you pay for. In the past, it was assumed that a T1(DS1) came with a superior level of support from the telcos. I still remember one hour service from Pacific Bell. Now daze, T1 is just another service and may just be a muxed channel off some telco fiber. I actually get better service from my DSL lines than I do from the T1's. The only real benefit of a T1 is the 1.5Mbits/sec outgoing bandwidth, which cannot be easily supplied via DSL.

The conventional rule of thumb for loading is: 100 users doing light web browsing and email. 10 business users doing whatever business users do. 1 file sharing user.

What is unacceptable? Only having 50 computers on a single T1? Again, it depends on what those users are doing. By todays standards of bloated and bandwidth hungry applications, a T1 is a small pipe. If you would kindly dig out the sniffer and see what's moving on your T1, you might have a better idea of whether you're dealing with a capacity problem or an abuse problem.

For example, a customer calls me on Sunday morning (yawn) to ask why their T1 is moving large amounts of traffic when there's nobody in the office. This is a good question. I expected to find a virus, worm, or hacker. Instead, I found that a clever user had found a program that "synchronized" his files between his home computer and his office machine. He had set it up incorrectly and it was "synchronizing" much of the corporate server farm as well as gigabytes of junk on his desktop. Eventually, it would have killed his home computer, but I didn't want to wait. So, I dived into the managed ethernet switch, pulled the virtual plug on his machine, and left a nasty voicemail message. This type of nonsense happens all the time.

Another example. A while back, I noticed that the MRTG traffic graphs showed that someone was downloading about 25Mbytes of something every

5 minutes. It was causing problems with VoIP traffic and streaming content. It turned out to be Symantec Live Update trying to update Norton Antivirus. One problem. Norton AntiVirus had been removed from that machine, but not Live Update. It would merrily try to update NAV, fail, and then try again in 5 minutes by downloading everything over and over and over, etc.

Moral: You need to know what's moving on your network or you can't do anything useful in the way of troubleshooting and capacity planning.

Got it. Your thinking is based on emotion. I have a ladyfriend that sometimes operates that way. The scarey part is that it often works. There are books and classes to optimized intuition, crystal ball gazing, Ouigi Boards, and pseudo science that may help with this way to non-technical troubleshooting. I've often suspected that the government also uses this method in their technical ventures.

You can't afford the ultimate. At this time, an OC-192 at

9.6Gbits/sec symmetrical is about as fast as commonly available. Korea has 10Mbit/sec consumer service. Most cable modems and some DSL vendros will do 6Mbits/sec download and 512Kbits/sec upload. Desktops will soon have 10Gigabit ethernet cards. Some crude numbers:

formatting link

Incidentally, if *ONLY* incoming bandwidth is an issue, you might wanna consider distributing the load. Get several DSL connections and use one of these to manage the load:

formatting link

DSL lines are MUCH cheaper than the T1. However, if your problem is outgoing bandwidth, a load balancing router will do nothing.

No load at all. Some IM clients (i.e. AIM) deliver advertising and stupid videos which grab a small amount of bandwidth, but nothing disgusting and nothing that's running all the time. However, if people are using IM for file transfers, the bandwidth use might be momentarily quite high.

I still think you need someone with network troubleshooting experience to impliment monitoring and traffic analysis. Render farms use LOTS of bandwidth. My guess(tm) is that you're speed problem may be in an unexpected area.

Vote

D

DanR 20 years ago

Jeff, I want to make sure I understand your comments.

Could the above sentence read "No. The bandwidth is distributed roughly equally among the workstations" that are at that moment sending / receiving on the Internet. In other words... the active workstations share the bandwidth. True? I think that is what you said below.

I'm really surprised to learn that a T1 Internet connection has these limitations. Seems then that (except for upload) it's like having 50 or so computers on a home DSL Internet connection. I would have thought that this would have been un-acceptable. My "thought" is not based on technical knowledge but I always assumed that a T1 was the ultimate way to go. One more thing. At any given time during the work day we have about 20 computers using instant messaging. Most of the time there is not traffic but the apps are always listening. Is that much of a load? I am extremely grateful for the time you've spent providing all this good information. If we don't have to run all new cable your tip will save our company a lot of money and labor.

Vote

W

Wizzzer 20 years ago

Good advice, Pierre. Also, Dan, do not overlook the network printers. I've had them start chattering several times (all HP's) and bring the network to it's knees. Drove us crazy trying to find the culprit.

Cheers, Wizzzer

Vote

J

Jeff Liebermann 20 years ago

Been there. HP LaserJet 4 with Jetdirect J2552 card. If I run out of paper, it floods the networks with garbage that was impossible to decode with Ethereal. That took me 6 months to find. It was fixed with a firmware update to the Jetdirect card.

Vote

D

DanR 20 years ago

Thanks to all who responded. The detailed replies were very helpful and enabled some of us non experts to ask the right questions of the person who will do the hands on work. We are on our way to the upgrade and sniffing around for Internet issues. Dan

Vote

M

Moe Trin 20 years ago

Don't know what your network looks like, but HP only has a handful of OUI blocks:

[compton ~]$ zgrep -i Hewlett MACaddresses.gz | grep base | cut -d' '

-f1 | column

0001E6 000883 000E7F 00110A 001321 001560 0060B0 0001E7 000A57 000F20 001185 001438 00306E 0080A0 0004EA 000D9D 001083 001279 0014C2 0030C1 080009 [compton ~]$

That's straight out of the IEEE file. I'm at an R&D facility, and we're super paranoid, so every host is 'registered' meaning we know MAC, IP, user, location, which drop from which switch, serial and decal numbers, and the date of last tetanus shot for everything that connects to our net. If something starts squittering, I can ID the box in seconds. If the box is unknown, I can ID the drop, and it's 50/50 if the security goons get there before me or not.

Old guy

Vote

J

Jeff Liebermann 20 years ago

Well, if the 802.3 Ethernet packets were well formed and contained MAC addresses, tracing the problem back to the source would have been trivial. Instead, what I was seeing was bursts of garbage that I couldn't decode. I tired Ethereal, a Network General Sniffer, NT Netmon, and a bunch of demo sniffers I downloaded just to see if they could make sense of the traffic. I could see the garbage very lightly flashing the lights on the hubs, but could not decode anything. I spend two days with a logic analyzer trying to capture useful data and decode the contents manually, but even that didn't produce anything useful.

Just to make it interesting, I made a rather stupid series of mistakes. This was in the days when hubs were in fashion and switches were expensive and scarce (approx 1997). They had about 50 boxes, in

3 locations, connected with Cisco 340 series wireless bridges, all interconnected with hubs. There were three identical HP LaserJet 4 printers involved. Nobody every deduced that the network running slow was caused by running out of paper because there was always someone around to replace the paper that was not directly involved in using the computahs. Running out of paper was a very uncommon experience, so the time of slow downs were not easy to predict.

I had wrongly decided that the various 16 port Linksys 10baseT hubs were the likely culprits and convinced management to go for an HP Procurve 4000 switch, mostly on the basis of speeding things up to

100baseTX-FDX. The switch arrived before I could finish some necessary re-wiring so one of the four hubs remained. The nice thing about switches is that garbaged and trashed packets do not go through a store and forward switch. Everything that was plugged into the Procurve switch worked without a slowdown. Everything that was still on the hub slowed to a crawl whenever the HP LJ4 ran out of paper.

Again, I wrongly interpreted the problem as being the hub and performed an overnight panic rewiring job to move everything to the switch. The slowing stopped. I thought it was fixed.

The nice thing about managed switches is that you can use SNMP and the internal diagnostics to detect problems. The three HP4 printers in question were on the last hub. When connected to the switch, the stats started showing large numbers of corrupted packets. Of course, I didn't bother labeling the cables so I didn't have an immediate clue as to where the junk was coming from. This time, I incorrectly blamed the wiring. After wasting some time with a borrowed cable certifier, I eventually figured out the corrupted packets were associated with the printers.

Upgrading the flash on an HP J2552 is somewhat of a challenge. The HP software sucks. One mistake and the $300 card is a paperweight. It took a while but I eventually got all three cards upgraded and configured. The problem hasn't surfaced since.

If there had been anything decoded by a sniffer, I would have found the source almost immediately. Instead, it was a painful 6 month ordeal, with lots of bad guesswork, and a substantial amount of luck in finding the problem. What I consider the most important lesson from the aformentioned exercise was that I could not have figured it out without the statistics and diagnostics from the managed switch.

Vote

M

Moe Trin 20 years ago

Oh, _that_ kind of garbage. Yeah, had that with old 10Base5 tranceivers with later model Sun SS5 and SS20. Drove us absolutely bananas till we caught one in the lab. We were using a NetGen sniffer, and I forget what it was that we were finally able to spot - vaguely it was a fraction of the Ethernet header, but that was years ago.

We had a Tektronix 535 scope on a platform, with another guy with the probe in the overhead ceiling. Total waste of time. We did see there was an occasional fractional packet (wasn't long enough to be a collision), and actually had people log into each box on the subnet and look at the ifconfig -a stuff. No joy.

1997 - we had just completed installing Kalpana Etherswitches to break our 750 foot lengths of 10Base5 into smaller segments, and to get the routers and busy servers onto their own ports. I didn't ask how expensive the Etherswitches were, but they made a significant improvement - and they had (some) smarts!!!

Yeah, our users are "trained" to reload paper bins. They'd manage to screw something else up, but paper usually got loaded as soon as someone came to pick up their printouts, and found nada. Some of the "smarter" ones learned how to cancel and re-run print jobs on alternative printers. Why they wouldn't reload the paper? Who knows.

We had two buildings with twisted pair - I swear it was Cat 1/2 - and one section of the main building with Cat 5. Everything else was coax.

Similar with the old tranceivers, except we had them on three of the

16 ports. That narrows it down, but doesn't get the exact answer.

Boy does that sound familiar ;-)

Coax is just as bad if not worse - the blinky lights are on the tranceiver up in the ceiling (and under floor in the server rooms). Until we broke things up with the Etherswitches, our coax runs were up to 750 feet long, and had up to 400 systems on that one wire. Slightly out of spec, but it worked.

Old guy

Vote

Company network slowdown

Join the Discussion

Didn't find your answer?