Frustrated that I don't UNDERSTAND why my network times out

B

billy 12 years ago

Why can't I connect (via port 80 or any port) to a certain web site?

For more than a year I've had the same problem, and, it's NOT the way I'm running traceroute! (e.g., it's not ICMP vs TCP, etc.). It's also not that the server I'm pinging is down, or slow.

There's something wrong with "my" home networking setup. But what?

I just want to UNDERSTAND the problem. That's it. It makes NO sense what I've been seeing over the past year.

Basically, for months at a time, I can't connect to centos.org and, for months at a time, I can connect to the web site.

When I can't connect, traceroute (ICMP or TCP) fails to connect; when I can connect, traceroute also connects.

So, it isn't HOW I'm running traceroute, as traceroute is telling me exactly what Firefox is telling me.

This happens for months at a time, and has happened about five times in the past two years.

I change NOTHING (not my router firewall, not my computer firewall, not my networking setup, etc.) in the interim.

When this happens, I switch to TOR, and I can EASILY connect to centos.org via the proxy Firefox - so there's nothing wrong with my firewall or with my home broadband router (as far as I can tell).

When I can't connect, I ask my NEIGHBORS who "can" get to centos.org to show me their traceroute, and it looks the same as mine except for the fact that their times are slightly faster and they get past that last hop - whereas mine dies at the penultimate hop.

So, THAT would implicate something on "my" side (but what?).

I switch to Knoppix 7, and I get the same result. I go to a Windows PC, and I get the same result. So, it's NOT the PC!

If I knew how to get around my router, I would, but it has all the setup for the ISP (it's a WISP, not cable or DSL).

My question?

How can I debug WHY (for months at a time), I can't get to a web site?

Here's a traceroute run just now: knoppix@Microknoppix:~$ traceroute

formatting link

traceroute to

formatting link

(72.232.194.162), 30 hops max, 60 byte packets 1 192.168.1.1 (192.168.1.1) 2.835 ms 2.809 ms 20.293 ms 2 REDACTED_WISP.net (xxx.xxx.xxx.xxx) 20.280 ms 20.265 ms 20.248 ms 3 10.50.0.1 (10.50.0.1) 29.973 ms 29.959 ms 29.943 ms 4 10.25.0.1 (10.25.0.1) 39.067 ms 42.759 ms 42.745 ms 5 10.20.0.1 (10.20.0.1) 82.295 ms 82.280 ms 82.265 ms 6 10.0.0.1 (10.0.0.1) 122.956 ms 159.675 ms 159.654 ms 7 69.36.226.193 (69.36.226.193) 198.537 ms 201.445 ms 201.433 ms 8 vl2.core1.scl.layer42.net (69.36.225.129) 201.423 ms 201.412 ms

201.388 ms 9 216.156.84.141.ptr.us.xo.net (216.156.84.141) 201.377 ms 201.361 ms 201.346 ms
10 207.88.14.233.ptr.us.xo.net (207.88.14.233) 239.215 ms 239.185 ms
239.171 ms
11 vb15.rar3.dallas-tx.us.xo.net (207.88.12.45) 239.137 ms 239.122 ms 239.061 ms
12 207.88.14.34.ptr.us.xo.net (207.88.14.34) 239.030 ms 123.544 ms
178.276 ms
13 207.88.185.74.ptr.us.xo.net (207.88.185.74) 178.261 ms 178.264 ms
178.231 ms
14 border1.pc2-bbnet2.dal004.pnap.net (216.52.191.81) 178.234 ms border1.pc1-bbnet1.dal004.pnap.net (216.52.191.19) 178.187 ms border1.pc2-bbnet2.dal004.pnap.net (216.52.191.81) 178.199 ms
15 layered-11.border1.dal004.pnap.net (63.251.44.74) 178.171 ms
178.139 ms 178.123 ms
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *

I know, from two years of experiencing this, that the hop after the last hop showing resuls "is" Centos.org! So, when it works, it gets to the last hop; but when it dies, it always dies at just before the last hop. But why?

Can you help me UNDERSTAND why/how this situation can be happening? Note: All other web sites work just fine.

NOTE: I already know that YOU will be able to access this same site with much lower ping times (you're not on a WISP either) - but that doesn't help ME figure out what the problem is.

Is there freeware extant to help me UNDERSTAND why this happens to me?

Vote

G

Gary R. Schmidt 12 years ago

There is no freeware, or any sort of software available to you, that can help with your problem.

There is a "black hole" between you and centos.org.

Packets go in, but do not come out, that's what the traceroute is telling you.

Contact your ISP, and provide them with the traceroute, they then need to pass that to their (various) upstream connections to get the problem solved.

I would assume that the problem lies with "pnap.net", who- or what-ever they are, but they probably won't talk to you.

Your WISP appears to be connecting to "layer42" (69.36.226.193) as their gateway to the internet, again, they won't talk to you, but your ISP should be able to get them off their arses.

Cheers, Gary B-)

Vote

D

David Hough 12 years ago

Is it an MTU/fragmentation issue? (Check out ping -M)

Dave

Vote

U

unruh 12 years ago

Vote

C

Chris Davies 12 years ago

This is exactly what I would have suggested. Data packets have a maximum size dependent on the transport layer carrying them. The default size is typically 1500 for ethernet, and a little less for connections running over PPP and/or VPN. Some long distance WAN links can have even lower maximum packet sizes. If a packet cannot be transmitted in its entirety, it can be split (fragmented) unless the sender has specified that it must not be split. If it can't be split then the sender is responsible for transmitted the data in smaller sized packets, but obviously the sender needs to be informed that the packet size must be reduced. If there's a dubious firewall somewhere between you and the target - one that (incorrectly) eats the ICMP fragmentation request packets - then your sender can't realise that it needs to reduce the packet size, and such packets inevitably get dropped.

You can test this with ping -M, as David Hough has suggested. You can also reduce your own MTU and see whether this "fixes" the problem. Try "ifconfig eth0 mtu 1400", and experiment with different values.

Chris

Vote

T

Tauno Voipio 12 years ago

TCP should be able to find a suitable segment size, but it needs an ICMP message for the functionality. There are sysadmins killing all ICMP, in an attempt to hide from ICMP echo (ping). This could be the cause here.

Vote

B

billy 12 years ago

While I definitely value the help, part of why I am frustrated is that I don't UNDERSTAND the problem, and, as such, I consider ping a diagnostic tool.

The point being, the ping isn't the problem (it's just one way of showing the problem).

So, even if I get the ping to work, it still does nothing to solve the problem (although it may explain a bit).

Since the problem manifests itself in the inability to connect via port 80 (i.e., the web), I have previously doubted the way I'm running ping has anything to do with it.

To me, in my simple mind, trying to get the ping to work is sort of like having an engine misfire, and then I try all sorts of options on my voltmeter to get it to give me a good reading.

Whether or not I get a good reading on the voltmeter, I still have the misfire.

Back to the specifics, whether or not I get a good reading on the ping, I still have the web failing to connect.

I'm not saying ping isn't a great DIAGNOSTIC tool.

I'm just asking how a ping -M is going to help me UNDERSTAND why I can't connect via the web to centos.org?

Nonetheless, in the next post, I'll post my ping results (in the hope that it helps to UNDERSTAND what's going on!).

Vote

B

billy 12 years ago

I'm not sure what an MTU fragmentation issue is, but, if it is related to helping to EXPLAIN why I can't connect via the web to centos.org, I'll be glad to run *any* diagnostic procedure!

Here's the traceroute -M results:

# traceroute -M icmp

formatting link

traceroute to

formatting link

(72.232.194.162), 30 hops max, 60 byte packets 1 192.168.1.1 (192.168.1.1) 2.836 ms 2.828 ms 2.827 ms 2 MY_WISP_IP_REDACTED (xxx.xxx.xxx.xxx) 2.835 ms 5.766 ms 5.768 ms 3 10.50.0.1 (10.50.0.1) 12.510 ms 12.509 ms 16.000 ms 4 10.25.0.1 (10.25.0.1) 18.880 ms 18.878 ms 18.875 ms 5 10.20.0.1 (10.20.0.1) 31.081 ms 31.335 ms 34.052 ms 6 10.0.0.1 (10.0.0.1) 34.050 ms 18.724 ms 28.518 ms 7 69.36.226.193 (69.36.226.193) 28.488 ms 28.064 ms 28.043 ms 8 vl2.core1.scl.layer42.net (69.36.225.129) 28.025 ms 47.434 ms 47.413 ms 9 216.156.84.141.ptr.us.xo.net (216.156.84.141) 47.394 ms 30.086 ms 30.064 ms

10 207.88.14.233.ptr.us.xo.net (207.88.14.233) 80.031 ms 60.416 ms 70.204 ms 11 vb15.rar3.dallas-tx.us.xo.net (207.88.12.45) 70.189 ms 120.949 ms 120.925 ms 12 207.88.14.34.ptr.us.xo.net (207.88.14.34) 120.893 ms 75.048 ms 75.028 ms 13 207.88.185.74.ptr.us.xo.net (207.88.185.74) 75.009 ms 68.640 ms 101.778 ms 14 border1.pc2-bbnet2.dal004.pnap.net (216.52.191.81) 101.762 ms border1.pc1-bbnet1.dal004.pnap.net (216.52.191.19) 77.155 ms 87.903 ms 15 layered-11.border1.dal004.pnap.net (63.251.44.74) 87.889 ms 120.374 ms 123.342 ms 16 * * * 17 * * * 18 * * * 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * * #

Does any of that help diagnose WHY this one IP address times out on port 80 for months on end (and then works just fine for months)?

Vote

B

billy 12 years ago

Hi Chris, I posted my "traceroute -M centos.org" results because I truly want to UNDERSTAND what is going on.

If the problem is that my packets are too large, how does one control that in a web browser?

The ping is merely my diagnostic tool.

The *real* problem is that, for months at a time, I can't connect (via any web browser not on TOR) to:

formatting link

With TOR, I can connect easily - so it's not the Centos site itself.

QUESTION: How do I control packet size in Firefox?

Vote

B

billy 12 years ago

Hi Chris,

I appreciate your help as I'm trying to UNDERSTAND the problem, which is that, for months on end, I can't connect via the web to

formatting link

where a traceroute shows that the penultimate connection is dropping me. (So I am forced to use TOR and all works fine - albeit slowly.)

Then, for months at a time, I *can* connect to centos.org, where the traceroute shows the connection going through.

Googling for what an MTU is, I see it's the max size of a packet:

formatting link

Ethernet has an MTU limit, it appears, of 1500 bytes, so I can see why you're suggesting 1400 bytes.

I don't have much ethernet in the picture though, as I'm on a laptop connected by WiFi to my home broadband router which itself is wired by POE to a rooftop antenna which goes over WiFi about five miles to the WISP antenna where I lose control.

So, I assume you would want me to modify that command: ifconfig eth0 mtu 1400 To: "ifconfig wlan0 mtu 1400

Here's what ifconfig just reported: # ifconfig eth0 Link encap:Ethernet HWaddr 00:a0:00:3a:4f:23 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Interrupt:11 Memory:f2600000-f2620000

wlan0 Link encap:Ethernet HWaddr 00:a0:00:6a:9b:3d inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:267437 errors:0 dropped:0 overruns:0 frame:0 TX packets:167343 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:311152648 (296.7 MiB) TX bytes:29413063 (28.0 MiB)

So I ran the following: # ifconfig wlan0 mtu 1400 # ifconfig wlan0 wlan0 Link encap:Ethernet HWaddr 00:a0:00:6a:9b:3d inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1400 Metric:1 RX packets:267437 errors:0 dropped:0 overruns:0 frame:0 TX packets:167343 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:311152648 (296.7 MiB) TX bytes:29413063 (28.0 MiB)

And, then I tried to connect via Firefox to centos.org, but it still timed out.

Should I change the mtu even further down, say, to 1000 so that Firefox can connect to

formatting link

Vote

B

billy 12 years ago

Bearing in mind, the problem is that I'm trying to understand

*why* my web traffic is not connecting to centos.org, I'll try any suggested diagnostic procedure using whatever tools I have at hand.

I set my packet size on my laptop to a low value: # ifconfig wlan0 mtu 500

And, then ran the traceroute: # traceroute -M icmp centos.org traceroute to centos.org (72.232.194.162), 30 hops max, 60 byte packets 1 192.168.1.1 (192.168.1.1) 5.042 ms 5.029 ms 5.017 ms 2 WISP_IP_REDACTED (xxx.xxx.xxx.xxx) 5.022 ms 8.227 ms 8.227 ms 3 10.50.0.1 (10.50.0.1) 13.820 ms 23.623 ms 25.771 ms 4 10.25.0.1 (10.25.0.1) 25.767 ms 30.879 ms 30.877 ms 5 10.20.0.1 (10.20.0.1) 44.616 ms 46.995 ms 46.992 ms 6 10.0.0.1 (10.0.0.1) 52.204 ms 27.862 ms 31.134 ms 7 69.36.226.193 (69.36.226.193) 35.862 ms 50.007 ms 49.971 ms 8 vl2.core1.scl.layer42.net (69.36.225.129) 49.951 ms 74.962 ms 77.875 ms 9 216.156.84.141.ptr.us.xo.net (216.156.84.141) 77.857 ms 25.678 ms 25.643 ms

10 207.88.14.233.ptr.us.xo.net (207.88.14.233) 71.468 ms 91.228 ms 95.624 ms 11 vb15.rar3.dallas-tx.us.xo.net (207.88.12.45) 155.916 ms 85.719 ms 101.926 ms 12 207.88.14.34.ptr.us.xo.net (207.88.14.34) 95.461 ms 97.164 ms 103.047 ms 13 207.88.185.74.ptr.us.xo.net (207.88.185.74) 103.028 ms 63.041 ms 107.573 ms 14 border1.pc2-bbnet2.dal004.pnap.net (216.52.191.81) 107.556 ms 70.772 ms border1.pc1-bbnet1.dal004.pnap.net (216.52.191.19) 70.744 ms 15 layered-11.border1.dal004.pnap.net (63.251.44.74) 70.725 ms 89.757 ms 89.726 ms 16 * * *

Vote

B

billy 12 years ago

I'm all for a DIAGNOSTIC approach, since what I'm trying to figure out is WHY the Internet fails me on just one web site.

I just tried it from Knoppix and it is the same result (see below for the details).

I also have tried it from other PC's on the network, running Windows XP and Windows 7, and the same thing occurs.

I also run it via TOR on both Windows & Linux, and it works fine.

So, its clearly not the PC itself. It's in the network - but WHERE?

Anyway, here are the Knoppix results: root@Microknoppix:/# uname -a Linux Microknoppix 3.6.11-64 #10 SMP PREEMPT Wed Dec 19 23:51:48 CET 2012 x86_64 GNU/Linux

root@Microknoppix:/# ifconfig wlan0 mtu 300

root@Microknoppix:/# traceroute -M icmp centos.org traceroute to centos.org (72.232.194.162), 30 hops max, 60 byte packets 1 192.168.1.1 (192.168.1.1) 2.932 ms 2.929 ms 2.928 ms 2 WISP_IP_REDACTED (xxx.xxx.xxx.xxx) 5.683 ms 5.683 ms 5.680 ms 3 10.50.0.1 (10.50.0.1) 20.479 ms 20.477 ms 20.474 ms 4 10.25.0.1 (10.25.0.1) 20.469 ms 30.272 ms 33.014 ms 5 10.20.0.1 (10.20.0.1) 33.009 ms 36.190 ms 36.187 ms 6 10.0.0.1 (10.0.0.1) 36.182 ms 16.332 ms 39.405 ms 7 69.36.226.193 (69.36.226.193) 39.377 ms 20.721 ms 23.093 ms 8 vl2.core1.scl.layer42.net (69.36.225.129) 23.066 ms 16.977 ms 16.947 ms 9 216.156.84.141.ptr.us.xo.net (216.156.84.141) 16.917 ms 12.782 ms 18.611 ms

10 207.88.14.233.ptr.us.xo.net (207.88.14.233) 69.341 ms 119.252 ms 119.222 ms 11 vb15.rar3.dallas-tx.us.xo.net (207.88.12.45) 116.010 ms 84.054 ms 84.022 ms 12 207.88.14.34.ptr.us.xo.net (207.88.14.34) 83.978 ms 53.016 ms 52.987 ms 13 207.88.185.74.ptr.us.xo.net (207.88.185.74) 56.341 ms 53.502 ms 59.458 ms 14 border1.pc2-bbnet2.dal004.pnap.net (216.52.191.81) 59.430 ms 58.240 ms border1.pc1-bbnet1.dal004.pnap.net (216.52.191.19) 61.422 ms 15 layered-11.border1.dal004.pnap.net (63.251.44.74) 61.383 ms 54.510 ms 54.494 ms 16 * * *

Vote

T

Tauno Voipio 12 years ago

Get a goot text on TCP/IP protocols and learn how TCP does it. There is an ICMP message 'fragmentation needed but DF bit is on'. The segment auto-sizing is an essential part of TCP.

If a sysadmin has killed the whole ICMP somewhere in the path, there is little you can do, except whine to him.

Traceroute is not of much help here.

Vote

B

Bit Twister 12 years ago

You might try traceroute -I centos.org

Vote

B

billy 12 years ago

I'm all for any diagnostic procedure, so, here's the result from Windows 7 trace route:

C:\Users\billy>traceroute -M icmp

formatting link

'traceroute' is not recognized as an internal or external command, operable program or batch file.

C:\Users\billy>tracert

formatting link

Tracing route to

formatting link

[72.232.194.162] over a maximum of 30 hops:

1

Vote

B

billy 12 years ago

All this makes perfect sense, except ...

Except my neighbors, on the same WISP, can get to centos.org.

So, it must be something in 'my' setup; but where?

More specifically, how do I diagnose to pinpoint where?

Vote

B

billy 12 years ago

What I don't understand is whether the web browser, which is the problem observed, is using ICMP or TCP.

Q: What is the web browser using? (ICMP? TCP?)

Vote

B

billy 12 years ago

Just so I understand, are you saying that the web (i.e.,

formatting link

is using ICMP?

Vote

C

Chris Davies 12 years ago

Yes and no.

The data for a website is carried over TCP. The control commands (e.g. "slow down you're filling up the pipe", or "that packet's waayyy too big; make it smaller") are sent over ICMP. Pings and some traceroute programs also use ICMP, but they use a different ICMP message type.

A competently configured firewall might be set up to block ICMP ping requests. But there's no way that same firewall should be completely and naively blocking all ICMP packets.

Vote

C

Chris Davies 12 years ago

That's a good starting point, yes. Now you need to see whether your web browser can access centos.org. If it does, increase the MTU until it breaks and then back it off a little.

If it still can't access centos.org, even though you've got your MTU down at 500 then the chances are that this suggestion is inappropriate for your situation.

Chris

Vote

Frustrated that I don't UNDERSTAND why my network times out

Join the Discussion

Didn't find your answer?