SNMP critical values monitoring.
Notification by cellular phone, email or syslog.
version 1.0.1
Sveinar Rasmussen, 14th of
August 1997
Copyright (c) 1997 - University of Tromsoe, Norway.
Abstract. This document describes the Tcl extension for monitoring
static variables in routers or other agents using SNMP to access MIBs.
The SNMP (Simple Network Management Protocol) uses polling or traps to
communicate with the agent and the monitor equipment. MIBs are management
information bases containing a wide range of interesting variables to be
monitored.
1.0 Introduction
Certain states inside a router are not expected to change. The states
are reflected as variables in the MIB. For example, the voltage, ampere
and temperature should remain static variables. If these should change
for some reason, mysterious things might be in the happening - like routers
crashing and undoubtedly rendering the local network unusable for a period
of time.
In order to shorten and possibly avoid unreachable networks as due to
a router crash, we need monitoring. The "SNMP Monitor Ex" is a Scotty Tcl
extension to the Tkined
module. It adds the possibility to do strict monitoring of static variables.
Whenever any changes occurs, notification messages are sent off
to the network administrator to judge and possibly solve the problem.
2.0 Usage
The SNMP Monitor Ex (named snmp_monitor_ex.tcl) script is extended
from the original script snmp_monitor.tcl shipped with the Scotty v2.1.5
distribution. You can either add this script to your manager.tcl file for
the internal menu system in Tkined, or simply load the script directly
using the "Start Script" in the Tools menu found in Tkined. As yet another
alternative, you can overwrite your snmp_monitor.tcl file with the snmp_monitor_ex.tcl
file (not recommended due to further upgrades of the Scotty environment).
Once started, you will notice the similarity of the original script.
In fact, there are only two new entries for the monitor at this point.
Click on one or more routers discovered and activate "Monitor Strict".
Step 1)
You are now supposed to enter all the variables you want to monitor.
Separate the variables by space. Click on the "Start monitoring!" button
when done. Use the "clear" button to clear current variable string settings.
Step 2)
If the variable exists and it's the first time you launch the Strict
Monitor, you will be presented with a window. This is where you specify
certain settings concerning the monitor job to be initiated. These values
will be saved as default values for later use. You can change the values
later by clicking on the "Monitor Strict Jobs"->"Modify" menu selection.
Description of the fields in the settings window:
Send warnings to syslog. If this is true, the monitor will put the
warning messages in the system's log file.
Use SMS to send warnings. If this is true, the monitor will notify
changes on your cellular phone. Your SMS message will contain the name
of the variable, the old value, the new value and the IP of the machine
where the changes occurred.
SMS Cellular phone number. Specify the phone number to send off
the warning messages to here.
Use EMAIL to send warnings. If this is true, the monitor will notify
changes in your mail.
Email address. Specify the email address to which send messages.
Delay between each SMS/Email (minutes). The monitor will not send
SMS / Emails all the time as changes to the important variable occur. You
can specify the number of minutes between each warning sent out using any
of the external messaging systems (cellular or email). Default value is
one hour (60 minutes).
The settings you have specified in this window, are stored in memory to
ease your typing.
If you would like to change the delay between each variable readout,
please use the standard "Set Monitor Parameter" in the menu. The strict
monitoring facility uses this default value as polling interval.
Step 3)
Hit "Ok!" to fire up the monitor job with the specified properties.
If changes occur, the script will print these events in an "SNMP - Monitor
Report" window. If you are annoyed with it and just want stuff to be sent
to your cellular phone or mail box, please feel free to close this window.
Step 4)
During the monitoring of your variable(s), you can modify the properties
of each monitor job. Clicking your way in the main menu "SNMP Monitor Ex",
you'll find the "Monitor Strict jobs" menu item. As you are given a list
of current strict jobs, select the job you want to bring to attention and
gently press "Modify".
Now, you're presented with a similar window to the once initially seen
during step 2). One difference is that this new window also include a "Kill
job" button. Obviously, the reader might believe that one can kill the
monitor jobs by using the standard "Modify Monitor Job"->"kill job" provided.
Technically, it's true - the strict job would disappear but the internal
variables related to each strict monitoring job will remain allocated in
memory. Thus, to avoid wasting memory resources, it is preferred that you
use the "Monitor Strict jobs" for deleting jobs instead of the standard
job modification service.
As you are finished monitoring the values you have selected, the monitor
job can be deleted from the system. Click on "Modify monitor job" in the
SNMP Monitor Ex menu. As you are given a list of current jobs, the jobs
marked with StrictEvent are the ones you have created using the service
provided by the Monitor Strict functionality. Select "Modify" and "Kill
job" to end the monitoring of that specific variable on that specific machine.
2.0 Features.
This section will explain some of the features
included this the "SNMP Monitor Ex" as opposed to the original Tkined extension.
-
supports all types of variables found in the MIB.
E.g. octet streams aren't likely to change as often as counters - if they
do, you certainly want to know about it.
-
uses a configurable default value set instead of
popping up a requester every time you decide to initiate a monitoring job.
-
supports multiselected nodes and multiple variables
in requesters. A few variables contain multiple values (e.g. interfaces.ifTable.ifEntry.ifMtu).
Support for this is added.
-
since this monitoring tool is meant for critical
static supervision, there are no annoying chart diagrams for each monitoring
job as found in the original. The system will do the monitoring and warn
the user if anything happens.
-
warnings are sent off using the GSM SMS cellular
phone service, email, syslog and of course the screen.
-
if you save a tkined map, the monitor jobs you've
selected are saved. Jobs are restarted the next time you load the map.
3.0 Implementation issues.
Every new procedure I've introduced in the snmp_monitor_ex.tcl
script to provide this strict monitoring service, are described in this
section.
Monitor Strict
As the user releases the left mouse button over
the "Monitor Strict" menu item, this procedure is the first one to be called.
It asks the user to enter all the variables to be monitored in the upcoming
jobs. For every variable entered in the requester prompted to the user,
the procedure calls "MonitorStrict" to handle each monitoring job.
MonitorStrict
In order to fire up a monitor job, we have to
ensure that the variable exists and has an appropriate syntax. This procedure
opens an SNMP connection to the specified node, checks for syntax, creates
an unique identification value to be used later in the global array jArray
for indexing of internal variables.
Once finished, the SNMP connection is closed
and "startStrict" is run.
startStrict
This procedure opens an SNMP connection to the
specified node for each sub variable dug up in MonitorStrict. Upon a successful
connection, the value, time, name and description are read from the node.
These values are placed in the previously mentioned, global array jArray.
It's a two dimensional array.
StrictPrefs is then called in order to get the
default values for the SMS and email notification service. The arguments
to StrictPrefs are the ID and a TRUE boolean flag. The latter tells the
StrictPrefs not to open a window to prompt the user unless it really has
to.
Finally, the job is started. At the end of each
default interval, StrictEvent is run. The ID for the job is stored in the
array as well. However, since every job can handle many variables for multivariable
situations, only the first array entry get the job ID.
The job properties are also stored in the tkined
system. Issuing a "save" function call saves everything in order to be
restored when a saved tkined map in loaded. Old jobs are then fired up
again.
StrictEvent
The interrupt routine for each checking interval,
will launch StrictShow to handle the action concerning every variable.
A procedure like this also has to reconfigure the current job in order
to receive further job interrupts. Configuration of the job is done by
the configure command found in the job function.
StrictShow
All the action is found in this function. It
reads a new value from the SNMP connection previously opened, and compares
the newest value with the old one stored in the global array jArray. Network
nodes might be temporarily down, and thus reading from that particular
node would be impossible. A warning message is written to the screen if
this should be the case.
If the value can be read and has changed, a describing
message is prompted to the screen and / or as an entry in the syslog. If
it has been longer than a specified number of minutes since the
last message was sent to either the SMS cellular phone or email, the warning
is sent to respective receivers.
The newly read value from the node is stored
in jArray, and if an email or SMS message has been sent off, the time stamp
for this happening is recorded for later use, as well.
StrictPrefs
The preferences procedure has two parameters:
ID and a default flag. The ID points to the where in the global jArray
updates are to occur. If the default flag is "true", then the function
will not open a window asking the user for properties unless it has to.
It has to ask the user for properties on the first initial job, but that's
it - later occurrences use the default values once specified.
However, if the default flag is "false", the
user is forced to enter new default values in a window.
Everything is stored in the global array jArray,
as usual.
If one job is killed, the entries for that particular
job is deleted from jArray and the job is removed from the scheduling system.
Monitor Strict jobs
The user is able to change properties and delete
jobs started by the strict monitor. This procedure will create a list of
the strict jobs running. As the user chooses to modify any of these jobs,
the StrictPrefs procedure is launched with the default flag parameter set
to "false". The preferences window is overridden to open, new values are
stored and will act a new default values for potential new jobs initiated.
4.0 Conclusion.
Using the Scotty environment with Tcl extensions,
I have extended the SNMP monitor to handle monitoring of any variable types
in the MIBs. Notifications are sent to the screen, cellular phone, email
or syslog. The Tcl script is operational. Albeit the project is in its
early phases of development, the program can be put into operation and
serve as a decent quality of service network monitoring tool.
The notification service will give the network
administrator an early warning system. Hopefully, this will shorten the
duration of unreachable network situations.
5.0 ChangeLog
v0.0.0 - v0.0.2:
-
Initial internal versions.
v0.0.3:
-
Improved monitoring engine (supports DisplayString
etc.).
-
Added clear button.
-
Forced popup of warning window if it was closed by
user.
v0.5.1:
-
Added support for syslog entry.
-
It's now possible to remove the SMS / email delaying
message system.
-
Added support for saving and restoring running jobs.
v0.5.2:
-
Added support to handle unconfigured sendmail, i.e.
the program catches sendmail warnings.
-
Now handles nodes that are temporarily down. Prints
out a warning to the screen - doesn't report this as an error on the SMS,
syslog or email. This actually happens quite often (transient failures).
Avoids unneccesary warnings.
-
Added date and time in the warnings strings. Exception
is the syslog entry, which already has a time stamp. The format is a short
string to get below the 160 bytes available for the SMS message.
-
Removed the persistent popup warning window method
imposed by v0.0.3. Now uses a standard window. Remember to close the window
using the menus. If you close it by the "close window" button, it will
forever disappear.
-
Bugfix. The configuration window had problems with
updating the timestamp for emails, syslog and SMS.
v1.0.1:
-
Official release.
-
General stability increased. Program crashed if one set two identical jobs on the same machine and tried to delete the jobs.
6.0 Source code.
Sveinar Rasmussen (web)