host availability

Hi

I would like to get a perl, python script that would do the following:

Tell me when a host from a list of hosts went up or down, such as google.com, my default gateway, comcast's mail server etc.

Log the change in state for any host in the list to the /var/log/messages.

Basically I want to see when my services are down.

Does anyone know of some scripts that do this?

Reply to
zee
Loading thread data ...

Why does it matter to you whether comcast's mail server was up or not, if you can't -reach- the mail server?

The answer to your question as stated is "No, that cannot be done." You may wish to rephrase the question.

Reply to
Walter Roberson

Oh, come on. I can't do a local log into /var/log/messages if I can't ping the mail server at comcast?

Reply to
zee

Review what you asked for. You asked for the script to tell you whether a host was "up or down". ping can't tell you that.

If you don't get an answer to a ping request, then the meaning is... that you didn't get an answer to the ping request.

Firewalls, network congestion, committed access rates, automated protections against Denial of Service attacks, routing problems, dead router, dead switch, faulty power supplies, device in the middle happened to be rebooting, packet get corrupted in transmission and was discarded, dns servers got poisoned, dns servers got corrupted data, load balancing mispredicted availability, rogue system on the net sent you an icmp redirect and your equipment -believed- it, target host packet buffers got full and host discarded packet, remote host admin happened to reset the interface right then, remote system got trojaned and the replacement program ignores pings, remote system has an operating system bug in its network stack and can't send packets to

-your- IP, remote system has a SCSI bug glitch and goes into a strange state, NIC goes bad... and more.

I've had nearly all of those happen to me at various times. And as a systems administrator, there have been times when I've sat down at a system console, and it has taken me an hour or more to figure out whether the system is down or whether it's just not talking to me. If I, right *at* the equipment, cannot tell whether the system is up, then how the $#@ is YOUR system going to remotely figure out whether the system is up or not?

A successful ping tells you that the network able to pass icmp echo and echo reply packets, at least as far along as the device that answered the ping (which could be anywhere from your -own- network on outward, since the response could have been forged or could have been generated by a different device.)

Unfortunately, an unsuccessful ping doesn't tell you that the remote system is down. And the success or lack of success of the ping doesn't tell you whether you will be able to send your email message or do your google search. comcast has -several- mail servers, and google has -many- systems answering queries, ao failure of -one- IP to answer doesn't tell you that the service is kaput. Besides, they might just be ignoring

*your* machine.

Here's something for you think about: I have tested the Windows program Look@Lan, which will periodically do ping sweeps over pre-assigned IP ranges, and will do some nice reporting of what it finds (or doesn't.) Complete with a popup to indicate which systems became accessible or inaccessible on the last sweep.

What the results showed was that there were some systems that pretty much never were detected as going down, but that at least a third of the systems donen't a answer at in

Reply to
Walter Roberson

This is the discipline known as Availability Management. It is non-trivial and you're not going to get your answers with scripting only from your system.

Using ping will say, I can't get there from here. The reason can be many, but effectively, it just doesn't matter much. From time to time, however, I've seen 'unable to connect to FOO.COM', or 'page not available' as a direct result of the ISP DNS problems. I typically power of the modem/router and let it find another when it powers up after 30seconds.

You should also be aware, that many ISPs will stealth a ping request to things like the SMTP and HTTP servers. Your ping will always just timeout in such cases, but a TCP connection will still succeed.

Taking the simplistic approach, running PING FOO.COM and monitoring for REQUEST TIMED OUT in the middle of normal timings will tell you that connectivity has been lost. This may be all you need.

Reply to
Jeff B

Sorry for the truncated message; it was 4 am and I fell asleep at the keyboard. To finish off properly:

What the results showed was that there were some systems that pretty much never were detected as going down, but that at least a third of the systems did not answer at some point during the day, even though the systems and network were up for the entire time. Some systems were more prone to disappearing pings than others were. This was on a local LAN that I had complete control of -- and if pings periodically fail "for no good reason" on systems on a local LAN, then using ping to monitor remote systems is much *much* more likely to be generate spurious down-reports.

If you are using Windows, then you might want to experiment with Look@Lan and see how it goes for you.

Reply to
Walter Roberson

Sounds like you want "Nagios" or something similar.

formatting link
It runs a number of tests every xx minutes. It will actually open a connection to a server and look for a valid reply.

It isn't foolproof but it works well.

Its free, but can be hard to setup and use.

Scott R. Haven Sr. Systems Engineer Paisley Systems Inc. managed services, consulting, and support

formatting link

Reply to
Scott R. Haven

formatting link

Yours, VB.

Reply to
Volker Birk

a simple shell script for ONE site might be: site1=xx ping -t $site1 | awk '$1 == "Request" {print SITE " access in question:" $1, $2, $3 }' SITE=$site1

replace xx with an ip-address or domain-name and run it all as ONE LINE (news is wrapping it here)

the '-t' says it will run forever and to kill it, c

Reply to
Jeff B

Hmmm,

ping -t sets the TTL, in SRV4 and Linux.

I gather "Request" is a substring of "Request timed out". However, when the host is unavailable, what I see is one of the following:

- header and nothing else (for unreachable remote hosts and sometimes for local hosts)

- header and repeats of "sendto: host is down" (sometimes for local hosts in the same subnet, if the original ARP fails)

- header, and responses during the time the remote system is up; when the remote system goes down, no response line

What I don't ever see is anything like "Request timed out" -- I only see that for some -tcp- applications. When the remote system becomes unreachable, the result is emptiness.

Reply to
Walter Roberson

the -t is apparently platform dependent. Win/XP says $ ping /?

Usage: ping [-t] [-a] [-n count] [-l size] [-f] [-i TTL] [ [-r count] [-s count] [[-j host-list] | [-k ho [-w timeout] target_name

Options: -t Ping the specified host until stopped.

goofy I agree, but remember, MS does whatever it pleases and hang the rest of the world-compatibility.

your analysis is correct, but I started from the point of connectivity being available and NOT unreachable or DNS entry not found.

like I said .. a *simple* connectivity monitor with all kinds of things that are not addressed; and we get back to square one; this is a non-trivial problem.

Reply to
Jeff B

One place I worked ran an ICMP-echo based reachability monitor and called it a "Host Availability Monitor".

The application name only made sense after you read the SLA with the hosting company - it defined "Host Availability" as "response to ICMP echo requests from customer's site".

Triffid

Reply to
Triffid

Cabling-Design.com Forums website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.