host availability

- Z
- zee
  
  Contact options for registered users
posted
18 years ago

Tue, Jan 10, 2006 11:27 PM

Hi

I would like to get a perl, python script that would do the following:

Tell me when a host from a list of hosts went up or down, such as google.com, my default gateway, comcast's mail server etc.

Log the change in state for any host in the list to the /var/log/messages.

Basically I want to see when my services are down.

Does anyone know of some scripts that do this?

- W
- Walter Roberson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 11, 2006 12:21 AM

Why does it matter to you whether comcast's mail server was up or not, if you can't -reach- the mail server?

The answer to your question as stated is "No, that cannot be done." You may wish to rephrase the question.

- Z
- zee
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 11, 2006 1:02 AM

Oh, come on. I can't do a local log into /var/log/messages if I can't ping the mail server at comcast?

- W
- Walter Roberson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 11, 2006 10:24 AM

Review what you asked for. You asked for the script to tell you whether a host was "up or down". ping can't tell you that.

If you don't get an answer to a ping request, then the meaning is... that you didn't get an answer to the ping request.

Firewalls, network congestion, committed access rates, automated protections against Denial of Service attacks, routing problems, dead router, dead switch, faulty power supplies, device in the middle happened to be rebooting, packet get corrupted in transmission and was discarded, dns servers got poisoned, dns servers got corrupted data, load balancing mispredicted availability, rogue system on the net sent you an icmp redirect and your equipment -believed- it, target host packet buffers got full and host discarded packet, remote host admin happened to reset the interface right then, remote system got trojaned and the replacement program ignores pings, remote system has an operating system bug in its network stack and can't send packets to

-your- IP, remote system has a SCSI bug glitch and goes into a strange state, NIC goes bad... and more.

I've had nearly all of those happen to me at various times. And as a systems administrator, there have been times when I've sat down at a system console, and it has taken me an hour or more to figure out whether the system is down or whether it's just not talking to me. If I, right *at* the equipment, cannot tell whether the system is up, then how the $#@ is YOUR system going to remotely figure out whether the system is up or not?

A successful ping tells you that the network able to pass icmp echo and echo reply packets, at least as far along as the device that answered the ping (which could be anywhere from your -own- network on outward, since the response could have been forged or could have been generated by a different device.)

Unfortunately, an unsuccessful ping doesn't tell you that the remote system is down. And the success or lack of success of the ping doesn't tell you whether you will be able to send your email message or do your google search. comcast has -several- mail servers, and google has -many- systems answering queries, ao failure of -one- IP to answer doesn't tell you that the service is kaput. Besides, they might just be ignoring

*your* machine.

Here's something for you think about: I have tested the Windows program Look@Lan, which will periodically do ping sweeps over pre-assigned IP ranges, and will do some nice reporting of what it finds (or doesn't.) Complete with a popup to indicate which systems became accessible or inaccessible on the last sweep.

What the results showed was that there were some systems that pretty much never were detected as going down, but that at least a third of the systems donen't a answer at in

- J
- Jeff B
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 11, 2006 5:50 PM

This is the discipline known as Availability Management. It is non-trivial and you're not going to get your answers with scripting only from your system.

Using ping will say, I can't get there from here. The reason can be many, but effectively, it just doesn't matter much. From time to time, however, I've seen 'unable to connect to FOO.COM', or 'page not available' as a direct result of the ISP DNS problems. I typically power of the modem/router and let it find another when it powers up after 30seconds.

You should also be aware, that many ISPs will stealth a ping request to things like the SMTP and HTTP servers. Your ping will always just timeout in such cases, but a TCP connection will still succeed.

Taking the simplistic approach, running PING FOO.COM and monitoring for REQUEST TIMED OUT in the middle of normal timings will tell you that connectivity has been lost. This may be all you need.

- W
- Walter Roberson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 11, 2006 11:15 PM

Sorry for the truncated message; it was 4 am and I fell asleep at the keyboard. To finish off properly:

What the results showed was that there were some systems that pretty much never were detected as going down, but that at least a third of the systems did not answer at some point during the day, even though the systems and network were up for the entire time. Some systems were more prone to disappearing pings than others were. This was on a local LAN that I had complete control of -- and if pings periodically fail "for no good reason" on systems on a local LAN, then using ping to monitor remote systems is much *much* more likely to be generate spurious down-reports.

If you are using Windows, then you might want to experiment with Look@Lan and see how it goes for you.

- S
- Scott R. Haven
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Jan 11, 2006 11:17 PM

Sounds like you want "Nagios" or something similar.

formatting link

It runs a number of tests every xx minutes. It will actually open a connection to a server and look for a valid reply.

It isn't foolproof but it works well.

Its free, but can be hard to setup and use.

Scott R. Haven Sr. Systems Engineer Paisley Systems Inc. managed services, consulting, and support

formatting link

- V
- Volker Birk
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jan 12, 2006 5:34 AM

formatting link

Yours, VB.

- J
- Jeff B
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Jan 12, 2006 8:01 PM

a simple shell script for ONE site might be: site1=xx ping -t $site1 | awk '$1 == "Request" {print SITE " access in question:" $1, $2, $3 }' SITE=$site1

replace xx with an ip-address or domain-name and run it all as ONE LINE (news is wrapping it here)

the '-t' says it will run forever and to kill it, c

- W
- Walter Roberson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 9:01 AM

Hmmm,

ping -t sets the TTL, in SRV4 and Linux.

I gather "Request" is a substring of "Request timed out". However, when the host is unavailable, what I see is one of the following:

- header and nothing else (for unreachable remote hosts and sometimes for local hosts)

- header and repeats of "sendto: host is down" (sometimes for local hosts in the same subnet, if the original ARP fails)

- header, and responses during the time the remote system is up; when the remote system goes down, no response line

What I don't ever see is anything like "Request timed out" -- I only see that for some -tcp- applications. When the remote system becomes unreachable, the result is emptiness.

- J
- Jeff B
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 17, 2006 1:14 AM

the -t is apparently platform dependent. Win/XP says $ ping /?

Usage: ping [-t] [-a] [-n count] [-l size] [-f] [-i TTL] [ [-r count] [-s count] [[-j host-list] | [-k ho [-w timeout] target_name

Options: -t Ping the specified host until stopped.

goofy I agree, but remember, MS does whatever it pleases and hang the rest of the world-compatibility.

your analysis is correct, but I started from the point of connectivity being available and NOT unreachable or DNS entry not found.

like I said .. a *simple* connectivity monitor with all kinds of things that are not addressed; and we get back to square one; this is a non-trivial problem.

- T
- Triffid
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Jan 17, 2006 1:57 AM

One place I worked ran an ICMP-echo based reachability monitor and called it a "Host Availability Monitor".

The application name only made sense after you read the SLA with the hosting company - it defined "Host Availability" as "response to ICMP echo requests from customer's site".

Triffid