IT organizations face increasing pressure to deliver more resources with less cash. IT staff must be able to justify the existence of each physical server and manage more servers on a dwindling budget. While there are excellent, open source network management tools available that assure the availability of services, collect and visualize server performance data, receive and correlate events, notify sysadmins of problems, and provide reports on each of these aspects, maintaining a separate tool for each task is time consuming.
Getting started with the open source server monitoring tool OpenNMS
First, a few words on server sizing are in order. With a solid hardware foundation, OpenNMS can manage thousands of nodes and interfaces. Small or virtualized servers are fine for a limited testbed deployment, but for large-scale production work, invest in an adequate physical server. Plenty of memory -- at least 4 GB and a 64-bit CPU -- is critical, as is using fast disks on a smart controller with a large, battery-backed write cache. OpenNMS supports many operating systems, but for simplicity, this tip uses Red Hat Enterprise Linux 5 or CentOS 5. All the below commands must be run as root.
OpenNMS uses the PostgreSQL database to store events and data on nodes, interfaces and services. To install PostgreSQL on your system, use the following command:
yum install postgresql-server
The Yellowdog Updater, Modified (yum) update manager will download and install required packages. You can initialize and configure PostgreSQL by running:
/sbin/service postgresql initdb
The next step is to edit PostgreSQL's host-based access configuration file: /var/lib/pgsql/data/pg_hba.conf. At the bottom of this file, there are three default lines that read:
local all all
host all all 127.0.0.1/32
host all all ::1/128
To simplify configuration, change the last part of each of these lines from “ident sameuser” to “trust.” Local connections will now be allowed without using a password.
To start PostgreSQL, run the following command:
/sbin/service postgresql start
Next, install the yum repository RPM package from the OpenNMS project. To do this, run the following command on your server:
After the repository RPM package is installed, you can install OpenNMS and supporting packages by running:
yum install opennms
Yum will again download and install all the required dependencies, and you'll find OpenNMS installed in the /opt/opennms directory. Next, run the OpenNMS database installer command:
Finally, start the OpenNMS daemons using this command:
/sbin/service opennms start
You're now up and running! Most day-to-day configuration is done through the OpenNMS Web interface. Point your browser to http://<your-server-ip-address>:8980/ and log in with the username and password “admin.” Click the “Admin” link near the top right of the browser window, then “Configure Discovery” under “Operations.” Skip to “Include Ranges” in the resulting page and click the “Add New” button.
In the pop-up window, enter the first and last addresses of an address range that contains several servers. Disregard timeout or retry settings unless your network is very slow. Click the “Add” button, then in the main window click “Save and Restart Discovery” at the bottom of the page.
The open source server monitoring platform OpenNMS scans the network for nodes, interfaces and services you’ve asked it to find. When completed, go to the Web user interface (UI) and click on the “Node List” link. You'll find a node for each address that responded to the Internet Control Message Protocol (ICMP) pings sent by discovery. If you drill into any given node, you'll find at least one interface inside with ICMP and other services (e.g. Secure Shell). If one service (or a whole interface or node) stops responding to the service assurance tests performed every five minutes by the OpenNMS poller, the system will create an outage record and an event.
OpenNMS sends notices via email or other methods when outages (or other events) occur. To get email notification, edit /opt/opennms/etc/javamail-configuration.properties. If your domain name is “example.com” and your Simple Mail Transfer Protocol (SMTP) mail relay is “smtp.example.com,” a minimal setup might involve uncommenting (removing the leading hash symbols) and setting these two properties as follows:
Next, go back to the Web user interface (UI), click on “Admin,” select “Configure Users, Groups, and Roles” under “OpenNMS System,” and click on “Configure Users.” In the list of users, click the “Modify” icon for the “admin” user. Enter your email address in the “Email” field and then click the “Finish” button. Click again on “Admin” in the “breadcrumb” links at the top left, and on the right-hand side of the page under “Notification Status,” click the “On” radio button and then “Update.” Notifications are now globally enabled.
Now your new system can do something useful. Log in to one of the servers that OpenNMS has discovered and stop the Secure Shell (SSH) service. You will see the outage in the Web UI within five minutes, and if email is configured correctly, a notice will be sent to your inbox. If you restart SSH, you'll see the outage (and the notification) self-resolve. The 24-hour availability of that service (shown in the front page of the OpenNMS web UI) will continue to reflect the recent downtime. The outage will also show up in the “Early Morning Report,” which you can run on demand from the Web UI in “Reports” → “Database Reports” → “Online Reports.” If you encounter problems, check out the “Get Help” section of the OpenNMS.org wiki.
OpenNMS is a powerful open source server monitoring platform with many capabilities. This tip focused primarily on service assurance, but as you explore the Web UI and add more nodes, you'll find that we've barely scratched the surface. In the next installment, we'll cover performance data collection via Simple Network Management Protocol, which is useful for servers and network devices.
What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at email@example.com.
This was first published in December 2010