From device discovery to visibility into systems, networks, and traffic flows, these free open source monitoring tools have you covered
In the real estate world, the mantra is location, location, location. In
the network and server administration world, the mantra is visibility,
visibility, visibility. If you don't know what your network and servers
are doing at every second of the day, you're flying blind. Sooner or
later, you're going to meet with disaster.
Fortunately, many good tools, both commercial and open source, are
available to shine much-needed light into your environment. Because good
and free always beat good and costly, I've compiled a list of my
favorite open source tools that prove their worth day in and day out in
networks of any size. From network and server monitoring to trending,
graphing, and even switch and router configuration backups, these
utilities will see you through.
1) Cacti
First, there was MRTG. Back in the heady 1990s, Tobi Oetiker saw fit to
write a simple graphing tool built on a round-robin database scheme that
was perfectly suited to displaying router throughput. MRTG begat
RRDTool, which is the self-contained round-robin database and graphing
solution in use in a staggering number of open source tools today. Cacti is the current standard-bearer of open source network graphing, and it takes the original goals of MRTG to whole new levels.
Cacti is a LAMP application that provides a complete graphing framework
for data of nearly every sort. In some of my more advanced installations
of Cacti, I collect data on everything from fluid return temperatures
in data center cooling units to free space on filer volumes to FLEXlm
license utilization. If a device or service returns numeric data, it can
probably be integrated into Cacti. There are templates to monitor a
wide variety of devices, from Linux and Windows servers to Cisco routers
and switches -- basically anything that speaks SNMP. There are also
collections of contributed templates for an even greater array of
hardware and software.
While Cacti's default collection method is SNMP, local Perl or PHP
scripts can be used as well. The framework deftly separates data
collection and graphing into discrete instances, so it's easy to rework
and reorganize existing data into different displays. In addition, you
can easily select specific timeframes and sections of graphs simply by
clicking and dragging. In some of my installations, I have data going
back several years, which proves invaluable when determining if current
behavior of a network device or server is truly anomalous or, in fact,
occurs regularly.
Using the PHP Network Weathermap
plug-in for Cacti, you can easily create live network maps showing link
utilization between network devices, complete with graphs that appear
when you hover over a depiction of a network link. In many places where
I've implemented Cacti, these maps wind up running 24/7 on 42-inch LCD
monitors mounted high on the wall, providing the IT staff with
at-a-glance updates on network utilization and link status.
Cacti is an extensive performance graphing and trending tool that can be
used to track nearly any monitored metric that can be plotted on a
graph. It's also infinitely customizable, which means it can get complex
in places.
2) Nagios
Nagios is a mature
network monitoring framework that's been in active development for many
years. Written in C, it's almost everything that system and network
administrators could ask for in a monitoring package. The Web GUI is
fast and intuitive, and the back end is extremely robust.
As with Cacti, a very active community supports Nagios, and plug-ins
exist for a massive array of hardware and software. From basic ping
tests to integration with plug-ins like WebInject, you can constantly
monitor the status of servers, services, network links, and basically
anything that speaks IP. I use Nagios to monitor server disk space, RAM
and CPU utilization, FLEXlm license utilization, server exhaust
temperatures, and WAN and Internet link latency. It can be used to
ensure that Web servers are not only answering HTTP queries, but that
they're returning the expected pages and haven't been hijacked, for
example.
Network and server monitoring is obviously incomplete without
notifications. Nagios has a full email/SMS notification engine and an
escalation layout that can be used to make intelligent decisions on who
and when to notify, which can save plenty of sleep if used correctly. In
addition, I’ve integrated Nagios notifications with Jabber, so the
instant an exception is thrown, I get an IM from Nagios detailing the
problem in addition to an SMS or email, depending on the escalation
settings for that object. The Web GUI can be used to quickly suspend
notifications or acknowledge problems when they occur, and it can even
record notes entered by admins.
As if this wasn't enough, a mapping function displays all the monitored
devices in a logical representation of their placement on the network,
with color-coding to show problems as they occur.
The downside to Nagios is the configuration. The config is best done via
command line and can present a significant learning curve for newbies,
though folks who are comfortable with standard Linux/Unix config files
will feel right at home. As with many tools, the capabilities of Nagios
are immense, but the effort to take advantage of some of those
capabilities is equally significant.
Don't let the complexity discourage you -- Nagios has saved my bacon
more times than I can possibly recall. The benefits of the early-warning
systems provided by this tool for so many different aspects of the
network cannot be overstated. It's easily worth your time and effort.
3) Icinga
Icinga started out
as a fork of Nagios, but has recently been rewritten as Icinga 2. Both
versions are under active development and available today, and Icinga
1.x is backward-compatible with Nagios plug-ins and configurations.
Icinga 2 has been developed to be smaller and sleeker, and it offers
distributed monitoring and multithreading frameworks that aren’t present
in Nagios or Icinga 1. You can migrate from Nagios to Icinga 1 and from
Icinga 1 to Icinga 2.
Like Nagios, Icinga can be used to monitor anything that speaks IP, as
deep as you can go with SNMP and custom plug-ins and add-ons.
There are several Web UIs for Icinga, and one major differentiator from
Nagios is the configuration, which can be done via the Web UI rather
than through configuration files. For those who'd rather manage their
configurations outside of the command line, this is a significant
benefit.
Icinga integrates with a variety of graphing and monitoring packages
such as PNP4Nagios, inGraph, and Graphite, providing solid performance
visualizations. Icinga also has extended reporting capabilities.
4) NeDi
If you've ever had to search for a device on your network by telnetting
into switches and doing MAC address lookups, or you simply wish you
could tell where a certain device is physically located (or, perhaps
more important, where it was located), then you should take a good look at NeDi.
NeDi is a LAMP application that regularly walks the MAC address and ARP
tables on your network switches, cataloging every device it discovers in
a local database. It’s not as well-known as some other projects, but it
can be a very handy tool in corporate networks where devices are moving
around constantly.
You can log into the NeDi Web GUI and conduct searches to determine the
switch, switch port, or wireless AP of any device by MAC address, IP
address, or DNS name. NeDi collects as much information as possible from
every network device it encounters, pulling serial numbers, firmware
and software versions, current temps, module configurations, and so
forth. You can even use NeDi to flag MAC addresses of devices that are
missing or stolen. If they appear on the network again, NeDi will let
you know.
Discovery runs from cron at set intervals. Configuration is
straightforward, with a single config file that allows for a significant
amount of customization, including the ability to skip devices based on
regular expressions or network-border definitions. You can even include
seed lists of devices to query if the network is separated by
undiscoverable boundaries, as in the case of an MPLS network. NeDi
usually uses Cisco Discovery Protocol or Link Layer Discovery Protocol,
discovering new switches and routers as it rolls through the network,
then connecting to them to collect their information. Once the initial
configuration has been set, running a discovery is fairly quick.
NeDi integrates with Cacti to some degree, and if provided with the
credentials to a functional Cacti installation, device discoveries will
link to the associated Cacti graphs for that device.
5) Ntop
The Ntop project -- now known as Ntopng,
for "next generation" -- has come a long way over the past decade. Call
it Ntop or Ntopng, what you get is a top-notch network traffic monitor
married to a fast and simple Web GUI. It's written in C and completely
self-contained. You run a single process configured to watch a specific
network interface, and that's about all there is to it.
Ntop provides easily digestible graphs and tables showing current and
past network traffic, including protocol, source, destination, and
history of specific transactions, as well as the hosts on either end.
You'll also find an impressive array of network utilization graphs, live
maps, and trends, along with a plug-in framework for an array of
add-ons such as NetFlow and sFlow monitors. There’s even the Nbox, a
hardware monitor that embeds Ntop.
Ntop even incorporates a lightweight Lua API framework that can be used
to support extensions via scripting languages. Ntop can also store host
data in RRD files for persistent data collection.
One of the handiest uses of Ntopng is on-the-spot traffic checkups. When
one of my Cacti-driven PHP Weathermaps suddenly shows a collection of
network links running in the red, I know that those links exceed 85
percent utilization, but I don't know why. By switching to an Ntopng
process watching that network segment, I can pull a minute-by-minute
report of the top talkers and immediately know which hosts are
responsible and what traffic they're pushing.
That kind of visibility is invaluable, and it's very easy to come by.
Essentially, you can run Ntopng on any interface that's been configured
at the switch level to monitor another port or VLAN. That's it.
6) Zabbix
Zabbix is a
full-scale network- and system-monitoring tool that combines several
functions into a single Web-based console. It can be configured to
monitor and collect data from a wide variety of servers and network
gear, offering service and performance monitoring of each object.
Zabbix works with agents running on monitored systems, though it can
also run agentless using SNMP or other monitoring methods such as remote
checks on open services like SMTP and HTTP. It explicitly supports
VMware and other virtualization hypervisors, producing in-depth data on
hypervisor performance and activity. Special attention is also paid to
monitoring Java application servers, Web services, and databases.
Hosts can be added manually or through an autodiscovery process. An
extensive set of default templates apply to the most common use cases
such as Linux, FreeBSD, and Windows servers; well-known services such as
SMTP and HTTP, and ICMP and IPMI devices for in-depth hardware
monitoring. In addition, custom checks written in Perl, Python, or nearly any language can be integrated into Zabbix.
Zabbix also offers customizable dashboards and Web UI displays to focus
attention on your most critical components. Notifications and
escalations can draw on customizable actions that can be applied to
hosts or groups of hosts. Actions can even be configured to trigger
remote commands, so a script can be run on a monitored host if certain
event criteria are observed.
Zabbix graphs performance data such as network throughput and CPU
utilization, as well as collects them in customizable displays. Further,
Zabbix supports customizable maps, screens, and even slideshows that
display the current status of monitored devices.
Zabbix can be daunting to implement initially, but prudent use of
templates and autodiscovery can ease the integration hassles. In
addition to an installable package, Zabbix is available as a virtual
appliance for several popular hypervisors.
7) Observium
Observium is a
network and host monitor that can scan ranges of addresses for systems
to monitor using common SNMP credentials. Packaged as a LAMP
application, Observium is relatively easy to set up and configure,
requiring the usual installations of Apache, PHP, and MySQL, database
creation, Apache configuration, and the like. It is designed to be
installed as its own server with a dedicated URL, rather than under a
larger Web tree.
From there, you can log into the GUI and start adding hosts and
networks, as well as autodiscovery ranges and SNMP data to have
Observium crawl around the network and gather data on each system
discovered. Observium can also discover network devices via CDP, LLDP,
or FDP, and host agents can be deployed to Linux systems to aid in data
collection.
All of this data is presented in an easily navigated user interface that
provides a multitude of statistics, charts, and graphs. This includes
everything from ping and SNMP response times to graphs of IP throughput,
fragmentation, packet counts, and so forth. Depending on the device,
this data will be available for every port discovered and include an
inventory of modular devices.
For servers, Observium will display CPU, RAM, storage, swap,
temperature, and event log status. You can incorporate data collection
and performance graphing on services as well, including Apache, MySQL,
BIND, Memcached, Postfix, and others.
Observium plays nice as a VM, so can quickly become a go-to tool for
server and network status information. It's a great way to bring
autodiscovery and charting to a network of any size.
Do-it-yourself
Too often, IT administrators think they can't color outside the lines.
Whether we're dealing with a custom application or an "unsupported"
piece of hardware, many of us believe that if a monitoring tool can't
handle it immediately, it can't be handled. That's simply not the case,
and with a little bit of elbow grease, almost anything can be monitored,
cataloged, and made more visible.
An example might be a custom application with a database back end, like a
Web store or an internal finance application. Management wants to see
pretty graphs and charts depicting usage data in some form or another.
If you're using, say, Cacti already, you have several ways to bring this
data into the fold, such as constructing a simple Perl or PHP script to
run queries on the database and pass counts back to Cacti or even an
SNMP call to the database server using private MIBs (management
information bases). It can be done, and it can generally be done easily.
If it's unsupported hardware, as long as it speaks SNMP, you can most
likely get at the data you need, though it may take a little research.
Once you have the right MIBs to query, you can then use that information
to write or adapt plug-ins to collect that data. In many cases, you can
even integrate your cloud services into this monitoring by using
standard SNMP on those instances, or by using an API provided by your
cloud vendor. Just because you have cloud services doesn’t mean you
should trust all your monitoring to your cloud provider. The provider
doesn’t know your application and service stack as well as you do.
Getting most of these tools running usually isn't much of a challenge.
They typically have packages available to download for most popular
Linux distributions, if they aren't already in the package list. In some
cases, they may come preconfigured as a virtual server. Configuring and
tweaking the tools can take quite a while depending on the size of the
infrastructure, but getting them going initially is usually a cinch. At
the very least, they’re worth a test-drive.
No matter which of these tools you use to keep tabs on your
infrastructure, it will essentially provide the equivalent of at least
one more IT admin -- one that can't necessarily fix anything, but one
that watches everything, 24/7/365. The up-front time investment is well
worth the effort, no matter which way you cut it. Be sure to run a small
set of autonomous monitoring tools on another server, watching the main
monitoring server. This is a case where it's always best to ensure the
watcher is being watched.
Source: http://www.infoworld.com
No comments:
Post a Comment