WDS + WPA + RADIUS problem

In the interest of unified learning it was suggested I post to this newsgroup, rather than direct email communication.

One of our clients is an accommodation complex. They want to provide internet access to all their tenants and unfortunately due to the layout of the buildings it was not cost effective to run wired cables. As a result we went for a wireless solution.

There are 4 groups of buildings in total. The administration building and 3 complexes with units in them. They are spread out over a reasonable amount of distance. Each complex has two physical buildings. The buildings are all brick and are two storey.

We ended up going with 4x Netgear WG302v2's, using 21db waveguide's and 1watt amplifiers. This was at the recommendation of a colleague of ours. The units were setup in peer to multi-peer configuration with the AP at the administration building being the main AP, where a server running freeRADIUS and an internet connection exists.

The APs were configured with WDS, having three SSIDs and VLANs. A silent SSID for the management vlan, an open security SSID for the guest VLAN which has a single website that provides the users with the information on configuring their computers and a WPA2 Enterprise secured SSID for the internet VLAN.

Users authenticate using WPA2/802.11x authentication against the RADIUS server using a username and password.

Aside from the complications imposed in configuring Windows based computers with dot1x, we got the system up and running, however we started experiencing randomly timed packet loss. We attempted adjusting the radio power levels in a hope to eradicate this problem (thinking it was radio overlap) but to no avail.

When the packet loss occurs, the APs drop out and have to re- authenticate against the primary access point, which in turn causes connection to the RADIUS server to fail and boots all users off.

We have since gone back and attempted to configure the system in purely repeater mode, but this resulted in more users being unable to stay connected for an extended period of time. We have tried taking out the AP of the building in the middle, so there are only 3 radios in total, this helped a little, except that a section of the complex does not get a strong enough signal to connect now.

We have setup monitoring to measure latency and packet loss, in an attempt to try and determine any set pattern for the problem, however all we have found is that it is mostly random - occuring any time throughout the day, but much worse in the evening when most tenants are attempting to use the system.

When this problem occurs, the radios event logs report "tx queue stuck" just before they bomb out.

Unfortunately there is no way to hard wire each AP together without an expensive outlay in fibre optics. It has also been suggested (by the colleague who helped us in the first place) that we may need to setup additional APs in the 5ghz spectrum to handle the link between the admin building and each complex, and run each of the existing radio's on a separate channel and SSID, to avoid overlap.

Although this would most likely work - it seems like a massive over complication and defeats the purpose of WDS to begin with.

We've struggled to find any really good information on this type of setup so any help would be greatly appreciated. The client is extremely frustrated as their tenants are not getting a fast, reliable connection and we need to find a solution for them soon.

Reply to
davidr (at) insane (dot) net (
Loading thread data ...

Reply to
davidr (at) insane (dot) net (

Well, you didn't do everything wrong, but you came close.

I suggest you reconsider. The advantages of a wired backhaul are substantial over WDS or other forms of mesh network. You only need a cable or fiber to each access point, not each apartment. I'll cover some of the other advantages as blunder along.

Time for a short rant. Please supply numbers instead of vague descriptions. How many buildings total? How high are the buildings. What are the distances involved in meters? Where in or on the buildings are the access points located? On the roof? Any coax cable? If so, what types and how long? How high off the roof? How are you entering each apartment with RF? Through the windows? How much bandwidth do you have to the ISP? How many active users per AP?

Please try to be more specific and supply numbers. Extra credit for hardware URL's so I don't have to Google for them.

While I'm busy ranting, in the future, please describe the problem you're having first, then supply the details. It keeps everything in context and is much easier to assemble an organized reply.

Problem #1. Too big an antenna and too much xmit power. The problem with 11dBi omnidirection antennas is that they have a very narrow vertical radiation angle. My guess is a -3dB beamwidth of about 5 degrees. If the antennas are perfectly vertical (yet another difficulty), then you will have wonderful coverage of the rooftops and possibly the upper floors. However, the lower floors may not get much RF.

Problem #2. The 1 watt amplifiers create what I call an "alligator". An alligator is an animal with a big mouth and small ears. Your access point has a big mouth (1 watt) but the same size ears as the average user client radio. However, the client radios only have perhaps 40mw of xmit output or about 4% of the power output of your amplified access points. The client have no problem hearing the access point. However, the access point can't hear the client due to the clients low xmit power. Working the numbers, your access points have 5 times the tx range of the clients tx range.

All you're doing is generating interference with the 1 watt amplifier. Remove it. Your usable client range will be about the same, and your self-interference problems will be somewhat reduced.

Well, at least you have a suitable scapegoat.

Problem #3. RADIUS authentication is UDP, not TCP. There's no guaranteed delivery mechanism. If a single packet is lost, it's gone forever and is not resent. RADIUS *ASSUMES* a reliable connection between the access point and the RADIUS server. By running it over the least reliable backhaul possible, you're just asking for login and authentication failures.

However, it can be made to work reliably. One way is to run the RADIUS authentication through a VPN tunnel. The tunnel will provide the reliable delivery. There are only a few bytes going through the tunnel so speed is not an issue. I'm not sure if it can be done with WG302v2 access points (because I'm too lazy to read the docs). I managed to get a flaky wireless point of sale system working fairly well using the VPN trick. The application was so brain damaged that the loss of only a single packet, would hang the server application. Also, I used PPTP and DD-WRT.

Problem #4. WDS has its place, but not outdoors. It's basically a mesh network with all its issues but none of the reliability offered by custom routing auto protocols, self healing, roaming, and such. WDS is about as crude a mesh network as could be build. WDS and mesh also force your system to put everyone on one channel. Think of it this way.... There are 3 non-overlapping channels (1,8,11). If you had a wired backhaul, you could put each access point on one of these 3 channels, and effectively have 3 times the over the air bandwidth, or

1/3 the mutual interference. I suggest you download and read carefully channel layout section of the Intel Hotspot Guide archived at: Ummm... don't tell Intel where you found it.

WDS and mesh also creates over the air traffic constipation. Since everything comes to one point (main router to ISP), most of the over the air traffic will be through one of the WDS AP's. That means for every packet from a client, another packet will need to be forwarded to the main router over the same "airspace" and on the same RF frequency. That cuts the available maximum bandwidth in half. Add a few retransmissions and the "airspace" will be full of extra junk instead of useful traffic.

What you should have done before deploying this abomination is put everything in one large room, turn everything on, and make a few thruput, latency, reliability, packet loss, and retrans rate measurements. If it can't work inside a single room, with no outside interference, it isn't going to work when installed on a rooftop. It only gets worse when you move from the office to the roof, mostly due to interference from other systems.

That brings up the problem of interference. Have you done a site survey of the area looking for tenants that have their own systems? I'm sure there is a huge number of existing systems in the apartment complex. All of them could create interference. There are also other sources of 2.4GHz junk. See:

I suggest you take known and potential interference problems rather seriously. It only takes one leaky microwave oven to shut down the entire neighborhood. How many microwave ovens are in the complex?

Problem #5. Too complexicated. The system management application probably doesn't need a VLAN and also probably assumes that it's going to be run over a reliable wired network, not a packet lost infested wireless link. The guest VLAN is nice, but the same thing can be done with a simple splash page prior to authentication. See NoCatSplash:

Ignore the rubbish about it not working in the final version. That was fixed long ago.

By management, I assume your using SNMP. What monitoring applications are you using? I'm just curious.

Good plan, but because RADIUS is UDP, you're going to have problems due to packet loss.

Perhaps an example from an existing system might be useful. See:

These are some "observations" from the MIT Roofnet mesh network project (which commercially morphed into Meraki). Note the extremely high packet loss (delivery probability). Can your system survive a consistent 50% packet loss at 1Mbits/sec?

Actually, you can easily test it with just a PC, two ethernet cards, and a floppy disk. See:

I use this tool to simulate and generate line impairments and to simulate and throttle a 100baseT ethernet link, so that it looks like a T1 or other telco line.

Up and running where? In a controlled environment where you can monitor performance and isolate problems, or installed immediately in the apartment complex?

Rip the amplifiers out of the system. They're causing more trouble than they're worth. They also probably cost more than the access points. However, they won't eliminate interference problems, especially mutual and self interference.

Hmmm... that doesn't sound like a wireless problem. I'm guessing, but methinks there may be some timeouts that are adjustable on the RADIUS server. You also seem to have a configuration problem. I'm not 100.0% sure, but as I recall, once you login and authenticate with the RADIUS server, there's no further traffic between the access point and the server until it's time to renew the encryption key. You have to be disconnected about a hour for that to become a problem. I haven't tried it recently, but I do recall rebooting the RADIUS server and being rather surprised that none of the clients went comatose. I'll double check if I have time (in a few daze).

Think of repeater mode as WDS without the static routes. Every access point that can hear a client will repeat the traffic. If all 4 of your AP's can hear a client, you now have 4 extra packets flying through the air.

I suggest you forget about omni antennas for illuminating the outside of a building. I use sector antennas which have a small vertical radiation angle, but a wide (90-160 degree) horizontal angle. This is perfect for illuminating a long but not very high building from a distance.

(12.5dBi) (14dBi) The catch is that as long as you remain committed to using WDS or mesh, you cannot use a directional antenna.

How? What tools? What numerical results? How bad?

No, in the evening is when they use their microwave ovens and home cordless phones.

Well, that might be a problem with the WG302v2 radios. That means that the access point is trying to flush its TX buffer, but there's so much interference that it can't hear the ACK's from the client radios. It's suppose to give up, flush the queue, and complain to the source. Instead, it's dropping the connection. I'm not sure exactly what's going on but it doesn't seem normal. Sniff with WireShark and maybe it's something easy.

Every apartment complex I've seen has CATV and telco services between buildings to/from a utility room. I've run 10baseT over telco twisted pair. There are also several systems for running data over CATV coax. If desperate, you could consider power line networking, but I wouldn't do that in a shared environment.

Yep. That would solve exactly two of the aforementioned problems. However, it would not solve the potential reliability problems caused by the use of UDP by RADIUS. Think of it in terms of reliability and fade margin. See SOM table at:

If your rooftop links have a fade margin of about 30dB, you'll have

99.9% reliability and 8.8 hours of downtime per year. That's not very good and you'll see random failures. You won't get that level of reliably if you run 4 wireless 5.7GHz bridges, all on the same frequency, to central access point. It would need to be individual point to point links, with 4 antennas and radios at the central AP. That's ugly. Otherwise, you're only improving a few things, solving a few problems, and ignoring the cause (packet loss). Try to find out why you're seeing packet loss. It might be something easy like relocating the antennas, more AP's, or a mix of 5.7GHz, coax, fiber, and CAT5 backhauls.

"Make everything as simple as possible, but not simpler" Albert Einstein.

You might want to ponder if you've made things a bit too simple.

Check out the mailing lists and forums for wireless ISP (WISP).

In effect, you're acting as an ISP. Everything is exactly the same as a wired line ISP except you have the added enjoyment of dealing with a totally unreliable delivery mechanism.

90 minutes to write this mess. I need a lower overhead "hobby".
Reply to
Jeff Liebermann

I think sadly we have been lead a little astray in the initial configuration - with the radio+antenna+amplifier design. The antenna's are mounted on the roof of the buildings (I will attempt to put together a site layout to show) and are about 3 feet above the roof.

formatting link
- this is a crude layout of the complex, i dont have a copy of the site plans on me at the moment to scan.

formatting link
and
formatting link
- are photos of the administration building and mounting of the antenna on this building

formatting link
- photos of the most complicated structure, complex 1 and mounting of the antenna

formatting link
- photo of complex 2 and antenna mounting. Complex 3 is identical to this building layout.

I'll have to go through their wiring a bit more closely. From our initial inspections of the complex each group of buildings had it's own IDF and power grid. I will have to see if they wire back to a central location - hopefully they do.

There 6 apartment complexs in total. I could not tell you their height (see photos). Access points are located in the roof of each of the buildings (see layout) with 3 metre cable running outside the building out on to the roof where the antennas are mounted, on top of the roof

- approximately 3 feet above the roof. We have two adsl2 connections coming into the network with approximately 24mbps between them (due to distance from the exchange). The complex has 144 rooms, which means

144 possible users (if each person has one computer). At present we are seeing 10 - 20 users per AP, although zero if any users are signing on to the AP in Complex 2.

The waveguide omni's we are using are from here:

formatting link
The Netgear WG302v2's hardware URL is here:
formatting link
The units are running in b/g mode with the waveguide connected to the primary antenna. Both antenna's are enabled.

I was concerned about that - there were a couple of rooms in Complex 1 which are on the ground floor of the northern most building which get poor reception unless the radio on Complex 1 is set at half or full power. From what you're saying (and what I've seen you and others post

- it would be better to aim directionals across the wall of the buildings and aim to get the signal through the windows, rather than the current "amplify the hell out of everything and hope to pick up a signal" scenario we've ended up in?

I'll be doing that first thing next week. At the moment we have the radios down on 1/4 power for Complex 1, 2 and 3. I assume once we remove the amplifier we would be able to boost these back up - in conjuction of a site survey - to help ensure a strong signal?

sadly :/ though that doesn't really help the client out. I am thinking maybe our colleague has a poor understanding of this type of installation. He had nothing to do with RADIUS/802.1x, just the selection and placing of the antennas.

This explains exactly why people can access the 'open' network without issue but struggle at times to get onto the secure network. We have since removed RADIUS authentication to take this out of the loop. We're working on an alternative for the security.

Why do you say WDS is not suitable for outdoors? I understand the non- overlapping channels and would love to put each radio on it's own channel, however without a backhaul method this seems impossible. As I've said, I will re-investigate our options for an alternative backhaul, but we can't put in fibre (the client refuses to spend that money) and I'm not sure the wiring of the complex will be very rewarding either.

Yeah we failed here. We did have the system running in our office for test, but this was without the waveguide's and amplifiers. We were just using the standard antennas.

Every unit has a microwave oven. There are 8 existing wireless routers belonging to tenants who have their own ADSL connections. There is also a radio broadcasting from the local university which is quite strong. We chose a channel with the least number of devices broadcasting on it - 11.

There is a microwave oven in each apartment - 16 in complex 1; 8 in complex 2; and 8 in complex 3 =3D total of 32

[nods] this was probably force of habit. We built/manage a VMPS based VLAN administration system and are used to putting all the devices into a management vlan (for security and the authentication data) and having a logon network for authentication and if necessary documentation.

NAGIOS/NRPE to identify periods of extreme latency and/or packet loss. SNMP graphing via RRDtool the interface statistics.

Yeah based on your initial emails, we're pulling the RADIUS method out in favour of something a little closer to Chillispot/NoCatSplash. To appease the students we've temporarily just built a MAC address list and added them to the firewall, removed the WPA protected network and are running it that way whilst we work out the rest.

I'll start looking into our options here. It would require more radios, which is obviously not the end of the world - it's just the backhaul which is the problem.

would it be feasible to use a radio like this one

formatting link
use the 5Ghz spectrum to create our backhaul if we are still unable to get a hard wired backhaul in place? Then possibly look at using sector antenna's as you suggest to better illuminate the apartments?

I know it may not sound like much, but your time and effort is really appreciated. My engineer and I have been pulling our hair out over this one and really wish we hadn't taken on the project in the first place. We were asked to do it as a favour to this client and were given advice by a colleague for the planning of it (and testing, site surveying, etc).

Another company quoted on the system using 3 x 15dbi omni antenna's, (like

formatting link
and 3 socris board style radios. Based on the problems we've faced with this complex, I'm wondering how they intended on doing this since they were only going to have a radio at admin, Complex1 and Complex3.

Cheers

David Rudduck

Reply to
davidr (at) insane (dot) net (

Actually, it's worse than that. The waveguide antennas are great as horizontally polarized omnidirectional antennas. The problem is that you have 6 buildings to cover, each of which has 12 sides. You're NOT going to cover a building by radiating RF *DOWN* through the roof. It's in the wrong direction for the antenna pattern. Looking at the layout:

Several sides of the buildings are not being covered at all. The far side of buildings opposite the antennas, on the other building in each complex, has no coverage at all (except the slop over from the admin building). Pretend it's an optical system instead of a RF system. If you were to optically illuminate such a building arrangement, where would you place the flood lights?

Good enough. However a KML file of the location for Google Earth, or just the LAT LONG would have been nice. I'm especially interested in the exact location of the trees. They block RF badly and will ruin any attempts to illuminate a building from a distance.

Another problem is the building construction. This is obviously a skool or university. The brick buildings are traditional (from the days when brick ship ballast was the cheapest building material). However, I can't tell if it's real brick or fake brick. It looks fake. What's under the brick and in the walls? Aluminum foil backed insulation will block all RF. Same with chicken wire under stucco.

The windows are the probable entry point for any RF. However, if they're coated with low-e (low emissivity) ecologically correct coating, it will severely attenuate the RF. You can test for blockage by brick, whatever is in the wall, and window attenuation with Netstumbler or any program that displays signal strength in -dBm.

Also, I don't like to mount antenna on the roof. Lots of reasons but the big one is that I rarely use omni antennas, so they don't need to be on the roof. I use panels (or patch) antennas which can be wall mounted. I suggest you consider wall mounting a AMOS/Franklin sector antenna on the wall of the building opposite the one being illuminated. For those areas without any handy building in opposition, find a pole and plant the antenna on top of it. Light poles work fine.

Another reason to use wall mounted panel antennas is that the building acts as a shield to prevent or block interference in that direction. That means you can possibly re-use the same channel on the opposite side of the building.

In the picture above, I see a large external electrical closet on the right, what I presume to be a (white) telephone and CATV closet to its left, and a collection of underground utility access manholes on the left. I think if you look around, you'll find that the buildings have interconnecting conduit for phone and cable. You haven't supplies dimensions but it looks like some of the inter building distances are less than 300ft, which means you can use CAT5.

If your wiring is anything like the USA universities, the wiring is like a tree. Things to tend to come together at one place (node) to form a star topology, but there are also many nodes scattered almost randomly. That makes telco and network wiring easy, but makes point to point wiring pure hell.

That's 24Mbits/sec coming in. How much going out? Is it guaranteed bandwidth or is it "up to 24Mbits/sec"?

Ummm... It's not just desktops and laptops. It's PDA's, game machines, Wi-Fi phones, cameras, internet appliances, etc. My rule of thumb for users per AP is: 100 home users doing light email and web surfing. 10 business users doing whatever business users consider useful. 1 file sharing (Bitorrent) user. I've been watching my coffee shop customers rather carefully.

1.5/0.256Mbit/sec DSL backhaul. Connected users tends to average around 25 connected users and peaks at about 50. Active users that are actually doing something run around 1 to 5 users. However, if one idiot fires up his file sharing software and cloggs the outgoing bandwidth, the active user count stays at 1. For your entertainment, here's the daily down/upload for February for one small coffee shop:

It was more in Jan because I wasn't around to kick off the abusers. Ummm... who's gonna administer the system and handle the abuse management?

Nice design, but I can't decode the specs. Horizontal polarization offers a marginal advantage for laptops, which tend to belch horizontally polarized signals. Most interference is vertical.

I just realized that I may have one sitting on the office shelf. However, I've had little experience in using one. All I've done it reset it to defaults, updated the firmware(?), and made sure it functioned.

What type and how much coax cable in between the AP and the antenna? At 2.4GHz coax is very lossy. Where is the amplifier? At the antenna or at the AP?

Exactly. I can't engineer this thing from a distance without knowing the method of construction and the RF attenuation of the various walls and windows. In general, a window is much easier to penetrate than a wall.

More tx power does NOT buy you more range. It only buys you more range in one direction. In the other the direction, the range is the same and is limited by the tx power of the client radio. The only way an amplifier is going to work is if both ends have amps.

There is one advantage to using an amplifier. If you have along coax cable run, the amplifier will essentially negate the coax losses. However, even with such an arrangement, the tx power should be more like 200mw, not 1 watt.

Yep. Crank up the power to whatever the WG302v2 will produce.

Sure it does. The first step to solving a problem is to blame someone. However, make sure you never blame the person that will eventually be responsible for fixing the problem.

Ok, I'll be blunt. The antenna selection, location, pattern, and type is all wrong. The number of AP's required is wrong. The selection of the WG302v2 is marginal (might work, might not, dunno). Testing was apparently minimal. I haven't seen mention of a site survey, where someone walks around with a laptop measuring signal strength from a specific antenna.

I haven't seen the actual installation, but I'll guess(tm) that the antenna to coax cable area is not waterproofed. This is not rocket science, but if you're going to cobble together a home made system form parts and pieces, get some local help from someone that's done it before.

Yep. However, you're still left with a packet loss problem. Dump the WDS and mesh and it will improve dramatically.

WDS is a mesh network. If you research mesh technology, you'll find a large number of proprietary and/or patented routing algorithms and performance enhancements. These are necessary for outdoor mesh systems because simple mesh networks (i.e. WDS) simply don't work well. WDS was designed to run a handfull of connected users. It doesn't scale well and rapidly falls apart when the user count climbs. Incidentally, mesh doesn't do much better but at least dies with a larger number of users.

Exactly.

I think you've demonstrated adequately that WDS isn't going to work. I've supplied a suffient number of reasons why it's a problem and why it's beneficial to supply an alternative. I also indicated that I don't think your selection of hardware and antennas was particularly optimum. Bottom line, if you want it to work right, there will need to be some major changes.

It should have failed anyway, even without the amps and big antennas. The problem is that it will work just fine with perhaps up to 5 laptops. Have everyone bring in their laptops, connect, and start downloading. The number of packets flying through the air are going to cause collisions, especially from multiple access points on the same frequency. What you should see is some of the most erratic performance possible. Try watching a YouTube video or for the ultimate thrill, a Netflix movie. You'll also see an oddity in that the system will act constipated and not reach the full 24Mbits/sec download thruput. That's because the multiple download connections will either overload the buffering in the AP or create so many airtime collisions that thruput will be severely limited. If you watch the fun on a spectrum analyzer, you'll notice that there's something on the air at all time, with no gaps in between. In other words, there's no more airtime left to fill. You can partially compensate by enabling CTS/RTS flow control in the router, but that will also slow down your maximum thruput. There is also software that will simulate multiple connections and multiple streams (IPerf, IXI Charriot). Anyone can make almost anything work with a small number of connections. The question is "duz it scale"? Most don't. See chart at:

The number of simultaneous user connections before the router craps out is limited on a substantial number of wireless routers and AP's. Unfortunately, the WG302v2 is not reviewed.

Perfect. Looks like you have the monitoring part of the puzzle set. Incidentally, I'm still using MRTG.

They go together. You can't add more radios because of the mutual interference problem of everything on one channel, unless you move the backhaul to something out of band, such as wire, fiber, HomePNA, HomePlug, MoCA, or 5.7GHz.

First, forget about MIMO (802.11n). It's all about speed, not range. It won't buy you much in this situation. I'm not familiar with the WNDAP330. The usual problem is that the 2.4GHz and 5.7GHz sections trash each other thus requiring separate antennas and alternate operation. The only way to find out is to try it.

That reminds me. Make sure you enable what Linksys called "AP isolation" or what others call "client isolation". It's the feature in the AP that prevents bridging between clients.

formatting link
>and 3 socris board style radios. Based on the problems we've faced

They are apparently counting on the very high gain of the antennas to "penetrate" the buildings. Depending on construction, that might actually work. It might go through one or two walls, but if they expect to go through an entire building to get to the building behind it, forget it. Besides the minor detail that I don't like omnidirectional antennas, I have had difficulties with very high gain omni's such as the allegedly 15dBi gain omni in the photo. Details on request.

Incidentally, it's:

which is located fairly close to my palatial office.

As you indicated, fixing the backhaul is the key here. The options are limited and can be independently investigated. I don't want to itemize right now. However, once the backhaul has been "fixed", there's still the issue of the number of AP's and the location of the antennas. If price is the deciding issue, I suggest you look at:

I'm still suspicious as to how the RADIUS server is operating. It should NOT cause users to disconnect if rebooted or disconnected from the network. It only handles login, authentication, and assignment of the WPA key.

Good luck.

Another hour. Grumble...

Reply to
Jeff Liebermann

Looking at the roof in

there does not seem to be any overlaps as if tile has been used. I wonder if it is metal or have they used a polycarbonate material? I believe they still use metal in Australia.

Reply to
LR

i'll respond to this more once we've assessed the cabling and done a proper site survey.. i've just ordered in a wi-spy 2.4x. figure i'll need something like that again in the future, possibly a bit over the top but i can't really afford this sort of stuff up again.

just with regards to location, here is the google maps arial view

formatting link
thanks for all your help so far.. we're going to tackle everything one step at a time..

1st. we took out radius authentication 2nd. disabled the AP in the middle complex (building 2) to drop interference 3rd. we'll take out the amplifiers, do a site survey and assess whether to put the middle ap back in or not 4th. we're investigating our options for the backhaul - although i am really unconfident about the viability of using existing wiring, but here's hoping. 5th. we'll look at better antenna's and more radios
Reply to
davidr (at) insane (dot) net (

i thought it would be appropriate that i came back here and posted an update .. both to thank those who provided some needed assistance and also to help anyone else out there who has a similar problem and is struggling (like i was) to find answers.

after the last post we went back and pulled the WG302v2's out and put in WRT54GL's running dd-wrt. we tested this in our office environment first and with some basic QoS rules we were able to put about a dozen computers (that's all we had here) on the mesh without any issues.

we put these into the field and they produced a much better result than the WG302v2's, however for reasons beyond our understanding DD- WRT would reboot for no apparent reason. After a bit of research we read that this was a common problem with DD-WRT and WDS, and that apparently 'Tomato' firmware did a much better job of this.

We upgraded all the WRT54GL's to run Tomato and this did infact stabilise the environment greatly. WDS / meshing on the WRT's is much better than the WG's, however during peak periods (as expected) we still experienced a lot of packet loss, causing the performance to degrade considerably.

We were unable to find any way to "hard wire" the radios and so last week we put in 2x 5.8Ghz radios to provide a backhaul for one of the

2.4Ghz radios which was previously meshed to the primary radio. Upon doing this the performance for this sector has jumped leaps and bounds. Users connected to that radio and the primary radio are now ecstatic with the performance, and users connected to the remaining two radios (running in WDS / mesh mode) are now asking the client to "do the same thing" to their building.

We are now in the process of purchasing another 4x 5.8Ghz radios to create the remaining two backhauls and will be implementing this as soon as possible.

We are faced with one remaining problem however - for reasons beyond our understanding the WRT54GL's still randomly reboot. This is usually only a couple of times a day, however we have been unable to work out WHY.

Research has indicated that this is a common problem with the wireless lan (wl) module in the firmware and that *apparently* OpenWRT's Kamikaze firmware has an updated wl module which is not susceptible to this problem. The only down side is that of course OpenWRT has no web interface and the web interface of dd-wrt and tomato has been very useful in helping to identify signal problems, bandwidth usage, etc (and thus will be beneficial in the ongoing management of this environment).

Alternatively we could put the WG302v2's back in - as these units did not "randomly reboot" and had a much stronger radio, however their own web interface was not overly intuitive either, nor was the QoS as effective as that in dd-wrt.

I am going to run up a WRT54GL with dd-wrt again and swap the primary radio over to test and see if we can get around the reboot problem. Failing that I will do the same with open-wrt, although as I said, I'm a bit hesistant because I like having the web interface for the information it provides.

The biggest problem with this "reboot problem" is that every so often when the unit(s) reboot, they lock up - requiring a power cycle. As the management of the complex are not always on site, this could create a long outage period which is not acceptible. I have considered putting in power timers to do a power cycle once every day at say 5am when all the students would be asleep, but would ultimately prefer the solution to be solid.

Any thoughts / suggestions on this would be appreciated. Again thanx for all the help - we are _nearly_ at a reliable solution.

Reply to
davidr (at) insane (dot) net (

I don't run WDS on my DD-WRT access points, but I have no spontaneous reboots. However, I do have some servcies and daemons going comatose, which has resulted in my setting up cron to reboot the router just after midnight every day.

However, in the past, I did have problems at coffee shops with table overflows that eventually caused hangs and reboots. Specifically, the DHCP lease table was filling up. That's because in v24, DD-WRT began saving the DCHP leases in NVRAM. That's handy for suriving a reboot where you would want all the current clients to get the same IP address after the reboot. However, it was a disaster at a coffee shop, where transient users and connections rapidly filled up the DHCP lease tables. There's a setting in there somewhere under Administration to disable this feature. Life was much better afterwards.

Yep. Putting the backhaul on 5.8GHz and getting rid of the WDS repeaters should yield a big performance boost. Users are no longer competing for air time with the backhaul. In addition, there are enough non-overlapping channels (12) on 5.8GHz to allow each backhaul it's own channels.

Syslog is your friend. Fire up Syslog on each router, turn up the reporting to maximum, and send the reports to central Syslog server. Ideally, it should be a Linux box, but Windoze can do the job:

Brainslayer (principle author of DD-WRT) did quite a bit of butchery on the WL command and associated drivers, mostly to remove "un-necessary" crud. It was polluted with far too many useless incantations and is now a mere shadow of it's former bloated glory (and well under half the size). Unfortunately, many of my commands no longer work. Sniff.

I have a different theory. I've crashed, hung, and ocassionally rebooted many different types of routers dealing with P2P (peer to peer) applications, such as Bitorrent, that open HUGE numbers of simultaneous streams. Each stream requires an allocated buffer in the router. The router eventually runs out of buffer space. DD-WRT and others have kernel hacks and buffer tweaks to reduce the effects, but that only raises the bar on how much abuse the router can handle. Instead of crashing, the router now just slows down:

See the diagnosis section.

I've also had problems in the past with a large number of UDP downloads (such as multiple screaming video downloads). However, the symptoms were not reboots, they were hangs, which required a manual reboot to recover. I haven't seen this problem recently so I presume (famous last assumptions) that it's fixed.

There are also assorted router exploits that sometimes cause problems.

At one point during the v24 development, DD-WRT was failing one of these tests, but that was (allegedly) fixed.

Wrong. See:

Not for me. The web interface is just too slow to be useful. You can dig out just about any numbers found on the web pages using commands on the CLI (SSH) interface by digging for it in the /proc filesystem. Try: cat /proc/net/dev for network traffic stats. You can also get most of these via SNMP.

I think it would be more entertaining to find the cause of the random reboots. That's not "normal".

Try blocking all P2P traffic (there's a setting in there somewhere) and see what happens. I know that you said you have P2P under control, but I don't believe it.

Try the web interface for OpenWRT and see if it helps.

That's not a "spontaneous reboot". It's a hang. Unset the check box that saves the DHCP leases (and arp table) in NVRAM. Kill off the P2P traffic or expand the buffers to handle the overload. Also, see if you can find a WRT54GS, that has twice the RAM as the WRT54GL. I've had better luck with these.

Remote reboot is fairly easy. The sloppy way is with X10 type power line modules. I have several remote sites with pager operated power controls. I used to use garage door openers for "drive by" reboots. There's also a setting in DD-WRT to have cron reboot the router at regular times or intervals.

There's one problem I haven't been able to solve that requires the nightly reboots. Some modules (I forgot which) go comatose after a few weeks. It's not a common problem, but it's one I don't want to deal with.

You're close. Nicely done and good luck.

Reply to
Jeff Liebermann

Also, try: cat /proc/net/wireless for wireless statistics.

Incidentally, this is my home Buffalo WHR-HP-G54 router running DD-WRT v24 SP1 which is NOT set to reboot every night:

root@DD-WRT:/proc/net# uptime 05:49:42 up 62 days, 18:25, load average: 1.09, 1.07, 1.01

There's no UPS on the router, but I do have a big fat electrolytic capacitor hung across the 5V power supply. Oops, looks like I'm an hour off and messed up daylight savings time. Might was well leave it along as it will fix itself next weekend.

Reply to
Jeff Liebermann

at the moment, 3 of the APs are still running in WDS mode (however we have now ordered the remaining 5.8Ghz radios so all of them will be running individually soon). the main radio in the WDS configuration is the primary problem, however every radio, including the one that is not in WDS reboot randomly throughout the day.

see that's the thing, the radios aren't doing any DHCP. that's all being handled by the server. the radios were storing bandwidth data to nvram, so I disabled that, but it still reboots. infact after i disabled logging bandwidth utilisation to the main radio (ap1, which is still in the WDS configuration), the unit locked up.

i know the lock up is a result of a reboot, as i've previously telnet'ed into the device and rebooted it, or clicked the reboot option inside the web interface and it has never come backup - requiring a hard reset.

i'll have to go look up the non-overlapping channels, but yes, definite improvement. apparently the next building in the complex will start construction at the end of the year. i've now convinced the client (after all of these problems) to run fibre to the building for the backhaul.

we're running syslog on the primary server with all AP's logging back to it already.

this entry is me disabling bandwidth recording to nvram last night. at which point the router locked up again.

Mar 28 22:41:58 unilink-ap01 kernel: nvram_commit(): init Mar 28 22:41:59 unilink-ap01 kernel: nvram_commit(): end

after that there are no more entries till 1.30am when the unit was either power cycled or miraculously came back online all by itself (doubtful, but haven't spoken to client this morning to ascertain).

here are entries from when ap02 (which is not in WDS) has randomly rebooted.

Mar 28 22:41:42 unilink-ap02 dnsmasq[609]: read /etc/hosts.dnsmasq - 1 addresses Mar 28 22:41:42 unilink-ap02 init[1]: Linksys WRT54G/GS/GL Mar 28 22:41:42 unilink-ap02 crond[615]: crond (busybox 1.12.3) started, log level 8 Mar 28 23:00:01 unilink-ap02 crond[615]: USER root pid 985 cmd logger - p syslog.info -- -- MARK -- Mar 28 23:00:01 unilink-ap02 root: -- MARK -- Mar 28 23:06:01 unilink-ap02 crond[615]: USER root pid 1097 cmd ntpsync --cron Mar 29 00:00:01 unilink-ap02 crond[615]: USER root pid 2107 cmd logger

-p syslog.info -- -- MARK -- Mar 29 00:00:01 unilink-ap02 root: -- MARK -- Mar 29 01:00:01 unilink-ap02 crond[615]: USER root pid 3205 cmd logger

-p syslog.info -- -- MARK -- Mar 29 01:00:01 unilink-ap02 root: -- MARK -- Mar 29 02:00:01 unilink-ap02 crond[615]: USER root pid 4287 cmd logger

-p syslog.info -- -- MARK -- Mar 29 02:00:01 unilink-ap02 root: -- MARK -- Mar 29 03:00:01 unilink-ap02 crond[615]: USER root pid 5380 cmd logger

-p syslog.info -- -- MARK -- Mar 29 03:00:01 unilink-ap02 root: -- MARK -- Mar 29 03:05:05 unilink-ap02 kernel: klogd started: BusyBox v1.12.3 (2008-12-14 02:54:58 PST) Mar 29 03:05:05 unilink-ap02 kernel: CPU revision is: 00029008 Mar 29 03:05:05 unilink-ap02 kernel: Primary instruction cache 16kb, linesize 16 bytes (2 ways) Mar 29 03:05:05 unilink-ap02 kernel: Primary data cache 8kb, linesize

16 bytes (2 ways) Mar 29 03:05:05 unilink-ap02 kernel: Linux version 2.4.20 (root@etch) (gcc version 3.2.3 with Broadcom modifications) #1 Sun Dec 14 03:03:26 PST 2008 Mar 29 03:05:05 unilink-ap02 kernel: Setting the PFC value as 0x15

Tomato's features are a bit limited. I think that's the extent of logging I can give it.

With the primary ap (ap01) i'm willing to accept it might be faulty and / or Tomato, so I've got another unit here that I've put dd-wrt back on to and will put in to replace and see if it makes a difference. I would _much_ prefer to leave the wrt54's out there instead of the wg302's.

hrmm.. we do run snort+snortsam plus ipp2p on the linux gateway to block all p2p type traffic. Obviously it's not going to be 100%, but it generally does a decent job. That's not to say that it isn't a result of buffer overload either. i'll dig into the settings and see what i can find to block more thoroughly.

One of the posts I read regarding this random reboot problem suggested that it could be in relation to certain wireless clients causing a problem with WL and forcing a reboot. I've lost the post now, but quite a few people were experiencing this problem.

i had read about these but tbh hadn't started looking into them. the router at our office has been running openwrt kamikazi for nearly 2 years now without a problem - only time it's needed to be rebooted was when we've had power outtages and it's locked up. and i do much prefer writing the firewall / qos rules in shell rather than a web interface

- i found dd-wrt implemented QoS nicely, but Tomato I think only implements QoS over the WAN interface, or at least that's how it appears as I'm not getting any noticable control over flow no matter what settings i put into the gui.

ahhh.. thanx for that, I hadn't even thought to go through the proc fs. i'll go have a hunt on our office router here and see what i can find. the other thing i didn't find (and i'm sure a quick search of the openwrt forums would present it) is how to set the radio power.. but yer, i'll go hunting.. can't be too hard.. i assume an nvram setting.

i'll keep digging into the settings.. tomato as i've mentioned seems fairly 'simple', but i'm going to switch back as we only switched to tomato as apparently (reading forums) it was meant to be more stable with WDS. it is/was, but as we've ascertained, WDS is not suitable for this situation.

hrmm.. that's trick. i'll have to have a look into this.. for me, it needs to be as automatic as possible, the site in quesiton is 1 1/2 hour drive by highway from where i am located..

cheers

dave

Reply to
davidr (at) insane (dot) net (

Oops. Just for fun, dive into the router from a telnet or SSH session and query the ARP table with: arp -a vlan0 is the WAN ethernet interface. br0 is the LAN ethernet interface. wl0 is the wireless.

Something is really wrong here. I'm not sure what version of Tomato you're using but version I favor for DD-WRT is v24 SP1 (nokaid). I tried a pre-SP2 and has some really strange problems, so I went back to SP1.

That's all wrong. I've only been playing with alternative firmware for perhaps 5 years and have never had a router fail to recover from command line or menu driven reboot. I've had failures after firmware uploads, but not ordinary reboots. What's interesting is that all your devices are doing it to varying degrees. Are you using the exact same firmware images on all the routers? Try a different build? If it still hangs on reboot, it's possible you have a hardware issue.

That reminds me. The WRT54GL (WRT54G v2-v4) will tolerate a very wide range of power supply voltages. It's possible that you may be experiencing power glitches. Try running it off a 6V or 12V gel cell battery or battery yet, a gel cell battery and charger.

Fiber and multiple CAT5 cables. Don't worry, they'll find a use for the cables (alarm, phone, internet, surveillance, HVAC, etc).

I was hoping to see a kernel panic. Very strange.

Power glitch? They work both ways. They can hang a router, but if the outage is long enough, they can also reboot the router. Also, see if crontab has anything in it that smells like a reboot at 0130. Hmmm... crontab -l doesn't seem to work on dd-wrt. Try: # cat /tmp/crontab or # cd /etc/cron.d and see what files you find.

No evidence of a failure. Just the usual hour markers and syslog updates.

Nope. You're blocking all P2P traffic that goes through the gateway. You can still have P2P between clients that are all inside the LAN. Do you have enabled "AP isolation" (also known as client isolation) enabled in the DD-WRT Wireless -> Advanced Settings page? You need this to keep your LAN side from becoming a private gaming network.

Don't bother. Just do some sniffing on the gateway and see what moving through it. Also what's moving through the LAN side.

Oh, I know how to crash a wireless system intentionally. However, I don't think you're clients are doing much of that. Sniff the over the air wireless traffic and see if you find an unusually large amount of ARP packets. Also, an unusually large number of source MAC addresses. Did you try the router exploits tests I suggested?

Almost everything to the radio is done with the "wl" command.

There are multiple methods, some of which have been disabled in DD-WRT. wl curpower (return current power level setting) wl powerindex (set power for 802.11a radio) wl pwr_percent XX (get/set percent of tx max power) wl txpwr1 (set tx power in assorted units) -d dbm units -q quarter dbm units -m milliwatt units -o turn on override to disable regulatory limits wl txpwr (get/set power level in mw) I guess(tm) that an: nvram commit will save the current settings.

The last one is ummm... interesting: root@DD-WRT:~# wl txpwr pwr in mw -2147483574 pwr in mw after override adj 74 -2147483577 Swell... Looks like I have some garbage in NVRAM.

Setting the power to +20dB and checking the results: root@DD-WRT:~# wl txpwr1 -d 20 root@DD-WRT:~# wl txpwr pwr in mw 80 pwr in mw after override adj 80 100 or better yet: root@DD-WRT:~# wl txpwr1 TxPower is 80 qdbm, 20.00 dbm, 100 mW Override is Off Nice.

I maintain a few mountain top weather stations. The drive is anywhere from 1 to 4 hours depending on the weather and road conditions. The systems usually fail when it's too hot, wet, windy, or cold. So, I have to implement various remote reboot systems. My favorite is butchering a Motorola Bravo page display section to act as a secondary decoder and relay driver. All it takes is an EPROM (or mess of diodes) hung off the display section. This allows me to use only one pager phone number, and program all the pagers for the same capcode. They all go off at the same time and display the same digits, but only the one with the matching decoder code, will close the relay. You can also get commercial systems with built in pagers: (formerly PageTap).

Reply to
Jeff Liebermann

Would putting the AP's in Client Bridge mode be an option? I've only used WDS a few times but found for me client bridge was the way to go. maybe you have a special reason this wont work but might be worth a try..

Adair "davidr (at) insane (dot) net (dot) au" wrote in message news: snipped-for-privacy@y34g2000prb.googlegroups.com...

Reply to
Adair Winter

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.