Need to monitor a UNIX-like system, but can’t install Netdata on it? With our SNMP collector and Net-SNMP, you can get basic system information with just a bit of relatively quick and easy configuration.
What is SNMP?
The Simple Network Management Protocol, commonly known as SNMP, is a relatively lightweight protocol designed for monitoring and configuration management for network appliances like switches, routers or gateways. However, it can also be used for those purposes on almost any UNIX-like system thanks to the Net-SNMP project.
There are a couple of basic SNMP concepts to cover first:
- An OID, or ‘Object IDentifier’, is a unique identifier for a specific resource exposed by a node over
SNMP. OIDs are structured like filesystem paths (read from left to right), but use
.
to separate individual components and use integers instead of names. For example, the OID.1.3.6.1.2.1.25.1.6.0
refers specifically to a counter that tracks the current number of processes on the node. Most OIDs start with either.1.3.6.1.2.1
(which are usually OIDs defined by the international standards organizations) or.1.3.6.1.4.1
(which are vendor-defined OIDs). We will be using raw OIDs in most places below, as those are what our SNMP collector actually uses for configuration. - An MIB, or ‘Management Information Base’, is a special document that dictates how OIDs under a specific prefix are structured and what they mean. An MIB maps between human-readable names and OIDs, but it also acts as a schema defining exactly what OIDs may exist under a given OID, and what type of data those OIDs contain. MIBs are generally provided by the vendor of the SNMP implementation. The two MIBs we will be using here are the HOST-RESOURCES-MIB, which is defined by the IETF in RFC 2790, and the UCD-SNMP-MIB, originally created by UC Davis for monitoring their own UNIX-like systems and currently maintained by the Net-SNMP project.
- SNMP v3 is the current version of the SNMP protocol. It includes support for user-based access controls, as well as support for on-the-wire encryption, making it the preferred version to use in secure environments. However, it’s a bit more complicated to set up correctly.
- SNMP v2c is the commonly used variant of the previous version of the SNMP protocol. It doesn’t support encryption, and uses a much more primitive access control system involving what are known as ‘communities’ (essentially, short plain-text passwords).
SNMP has a bit of a bad reputation among seasoned network adminjistrators as being far more complicated than it’s name suggests, though for the purposes of this document that complexity should not matter as we are using well-defined interfaces that are not particularly vendor-specific.
Setting up Net-SNMP
If you already know what you’re doing…
If you already have experience working with Net-SNMP and just want to quickly get things set up so you can use
the Netdata configuration below, simply set up a user or community for Netdata which has access to read objects
under .1.3.6.1.2.1.25
(the HOST-RESOURCES-MIB prefix) and .1.3.6.1.4.1.2021
(the UCD-SNMP-MIB), double check
that the UCD-SNMP-MIB::dskTable is populated (you may need to add some extra config to enable disk monitoring if
it isn’t), and then jump straight to the Netdata configuration section.
Common configuration
First, if you haven’t already, install Net-SNMP on the system you want to collect data from. On most Linux and
BSD systems, it’s in a package named net-snmp
. Some package repositories provide separate packages for the
individual components, in which case you just need the one that provides the snmpd
command and service.
You should also install the Net-SNMP command-line tools on the system where you will be running Netdata so that
you have a way to check that things are working correctly. Depending on the system, these may either be in the same
net-snmp
package, or they might be in a separate package called something like net-snmp-tools
.
If you have a firewall configured on the system you will be collecting data from, you will also need to add a rule to allow UDP packets on port 161.
The SNMP daemon configuration is usually stored in /etc/snmp/snmd.conf
. A basic configuration for our purposes
looks like this:
# Explicitly enable disk space collection
# Without this we can’t get disk usage info in the UCD-SNMP-MIB.
includeAllDisks 1%
# Define a named view of the OID tree.
# This is used to restrict access to just those subsets of the tree that we are collecting data from.
# It’s not strictly required, but is considered best practice from a security perspective.
view netdata included .1.3.6.1.2.1.25.1.5.0 # user counts
view netdata included .1.3.6.1.2.1.25.1.6.0 # process counts
view netdata included .1.3.6.1.4.1.2021.4 # memory usage
view netdata included .1.3.6.1.4.1.2021.9 # disk usage
view netdata included .1.3.6.1.4.1.2021.10.1.5 # system load averages
view netdata included .1.3.6.1.4.1.2021.11 # CPU usage, context switches, and interrupts
This still needs access controls to be configured, though how you do so depends on whether you want to use SNMP v3 or SNMP v2c.
SNMP v2c
If you trust all the users on your network, SNMP v2c is usually sufficient. If not, you should use SNMP v3 instead.
SNMP v2c setup is relatively simple, just needing a read-only community matched to the view we defined above.
# Define a community named ‘netdata’ restricted to the view named ‘netdata’.
rocommunity netdata -V netdata
# Same as above, but only allow access from 192.0.2.1:
rocommunity netdata 192.0.2.1 -V netdata
# Same, but allow all of 192.0.2.1
rocommunity netdata 192.0.2.0/24 -V netdata
# Same as the above examples, but for IPv6
rocommunity6 netdata -V netdata
rocommunity6 netdata 2001:db8::1 -V netdata
rocommunity6 netdata 2001:db8::/32 -V netdata
Once you have the configuration set up, start the snmpd service, and then run snmpget -v 2c -c netdata HOSTNAME .1.3.6.1.2.1.25.1.5.0
on the system where you will be running Netdata to collect the data, replacing HOSTNAME
with the host name or IP address of the system you will be collecting the data from. If everything is set up
correctly, this should produce output like HOST-RESOURCES-MIB::hrSystemNumUsers.0 = Gauge32: 0
, though the number
at the end may differ.
SNMP v3
SNMP v3 setup is much more complicated than SNMP v2c setup, but also allows for much greater security.
Setup of SNMP v3 requires defining a user, and then adding it to the required views.
Luckily, Net-SNMP provides a script called net-snmp-create-v3-user
that simplifies this configuration
significantly. To add a user for Netdata using this script, make sure the snmpd service is not running, and then
run (as root):
net-snmp-create-v3-user -ro -A PASSWORD -a SHA-256 -x AES netdata
Replace PASSWORD
in the above command with the password you want to use for the netdata user.
Once you have that set up, you just need to associate the new user with the view we defined in the base configuration like so:
# Restrict the ‘netdata’ user to the view named ‘netdata’.
rouser netdata -V netdata
# Same as above, but only allow access from 192.0.2.1:
rouser netdata 192.0.2.1 -V netdata
# Same, but allow all of 192.0.2.1
rouser netdata 192.0.2.0/24 -V netdata
# Same as the above examples, but for IPv6
rouser6 netdata -V netdata
rouser6 netdata 2001:db8::1 -V netdata
rouser6 netdata 2001:db8::/32 -V netdata
Once you have the configuration set up, start the snmpd service, and then run snmpget -v 3 -u netdata -l noAuthPriv -a SHA -A PASSWORD HOSTNAME .1.3.6.1.2.1.25.1.5.0
on the system where you will be running Netdata to collect the
data, replacing PASSWORD
with the password you set when creating the user, and HOSTNAME
with the host name
or IP address of the system you will be collecting the data from. If everything is set up correctly, this should
produce output like HOST-RESOURCES-MIB::hrSystemNumUsers.0 = Gauge32: 0
, though the number at the end may differ.
Configuring Netdata
Configuration of Netdata to collect this data is also relatively simple. Our SNMP collector is part of the Netdata Go plugin, so to open the configuration run:
/etc/netdata/edit-config go.d/snmp.conf
The configuration file itself is a YAML document.
A basic configuration for a single host looks like:
# If we don’t see a system on startup, check again every five minutes
# until we do see it.
autodetection_retry: 300
jobs:
- name: REMOTE_HOST # The name to show in the dashboard for this system.
hostname: HOSTNAME # The actual host name or IP address to connect to.
update_every: 5 # How frequently to collect metrics, in seconds.
# Configuration for SNMP v2c
#
# Uncomment the next two lines if you are using SNMP v2c
#community: public
#options: {version: 2}
# Configuration for SNMP v3
#
# Uncomment the next eight lines if you are using SNMP v3
#user:
# name: netdata
# level: authPriv
# auth_proto: sha256
# auth_password: &pass 'PASSWORD' # Make sure to change this to your actual password.
# priv_proto: aes
# priv_password: *pass
#options: {version: 3}
# Define each of the charts to collect
# This also defines a YAML anchor for the charts, so for additional jobs you can just use
# `charts: *unix-charts` instead of needing to repeat all of this config.
charts: &unix-charts
# CPU usage chart
# Not all systems provide all dimensions, but the ones that aren’t collected will be ignored.
- family: cpu
id: cpu
title: CPU Usage
type: stacked
units: '%'
dimensions:
- {algorithm: incremental, name: user, oid: 1.3.6.1.4.1.2021.11.50.0}
- {algorithm: incremental, name: nice, oid: 1.3.6.1.4.1.2021.11.51.0}
- {algorithm: incremental, name: system, oid: 1.3.6.1.4.1.2021.11.52.0}
- {algorithm: incremental, name: kernel, oid: 1.3.6.1.4.1.2021.11.55.0}
- {algorithm: incremental, name: iowait, oid: 1.3.6.1.4.1.2021.11.54.0}
- {algorithm: incremental, name: irq, oid: 1.3.6.1.4.1.2021.11.56.0}
- {algorithm: incremental, name: softirq, oid: 1.3.6.1.4.1.2021.11.61.0}
- {algorithm: incremental, name: steal, oid: 1.3.6.1.4.1.2021.11.64.0}
- {algorithm: incremental, name: guest, oid: 1.3.6.1.4.1.2021.11.65.0}
- {algorithm: incremental, name: guest_nice, oid: 1.3.6.1.4.1.2021.11.66.0}
# Load average chart
# The actual numbers being collected are 100 times the load average
# This is done to simplify processing
- family: load
id: load
title: Load Average
type: line
units: load
dimensions:
- {algorithm: absolute, divisor: 100, name: load1, oid: 1.3.6.1.4.1.2021.10.1.5.1}
- {algorithm: absolute, divisor: 100, name: load5, oid: 1.3.6.1.4.1.2021.10.1.5.2}
- {algorithm: absolute, divisor: 100, name: load15, oid: 1.3.6.1.4.1.2021.10.1.5.3}
# Memory usage chart
# Due to practical limitations, this lists total and available memory, not free and used
# The actual values reported over SNMP are in kibibytes, so this adjusts them to bytes for nicer
# presentation and auto-scaling.
- family: memory
id: snmp_memory
title: System Memory Usage
type: line
units: bytes
dimensions:
- {algorithm: absolute, multiplier: 1024, name: total, oid: 1.3.6.1.4.1.2021.4.5.0}
- {algorithm: absolute, multiplier: 1024, name: avail, oid: 1.3.6.1.4.1.2021.4.6.0}
# Swap usage chart
# Due to practical limitations, this lists total and available memory, not free and used
# The actual values reported over SNMP are in kibibytes, so this adjusts them to bytes for nicer
# presentation and auto-scaling.
- family: swap
id: snmp_swap
title: System Swap Usage
type: line
units: bytes
dimensions:
- {algorithm: absolute, multiplier: 1024, name: total, oid: 1.3.6.1.4.1.2021.4.3.0}
- {algorithm: absolute, multiplier: 1024, name: avail, oid: 1.3.6.1.4.1.2021.4.4.0}
# Interrupts chart
# This tracks the rate of interrupts on the system.
- family: interrupts
id: intr
title: Interrupts
type: line
units: interrupts/s
dimensions:
- {algorithm: incremental, name: interrupts, oid: 1.3.6.1.4.1.2021.11.59.0}
# Context switches chart
# This tracks the rate of context switches on the system.
- family: processes
id: ctxt
title: Context Switches
type: line
units: context switches/s
dimensions:
- {algorithm: incremental, name: switches, oid: 1.3.6.1.4.1.2021.11.60.0}
# Users chart
# This tracks the number of users the system reports as being logged in.
- family: users
id: snmp_users
title: System Users
type: line
units: users
dimensions:
- {algorithm: absolute, name: users, oid: 1.3.6.1.2.1.25.1.5.0}
# Processes chart
# This tracks the number of running processes on the system.
- family: processes
id: processes
title: System Processes
type: line
units: processes
dimensions:
- {algorithm: absolute, name: processes, oid: 1.3.6.1.2.1.25.1.6.0}
This configuration:
- Sets up an autodetection retry interval. This is generally important for most types of remote data collection, as it ensures that Netdata will eventually start collecting data even if the target system was not powered on (or the SNMP daemon was not running) when Netdata first tried to collect metrics from it. 300 seconds is generally a reasonable amount of time for this, as it avoids sending excessive network traffic but also ensures that data collection will start relatively quickly in most cases once the target system comes back online.
- Specifies a data collection interval of five seconds. The default configuration for the SNMP collector is every 10 seconds, which is reasonable for many network appliances, as they often cannot provide data quickly over SNMP, but for our usage it normally takes no more than a few miliseconds to collect and send the data we are fetching, so smaller collection intervals are generally safe, and this just comes down to a matter of how much impact the data collection has on the target system. On a big system without strict latency requirements, you can probably even push this all the way down to 1 second safely, but five seconds is a reasonable starting point.
- Sets up a series of charts for a number of basic system statistics. Where possible, the family and id values for these charts have been chosen to match up with native data collection charts that collect the same information, allowing for sensible aggregation on Netdata Cloud when using virtual nodes (see below for more info about that).
Disk space monitoring
Unlike the other metrics we set up in the configuration above, disk space monitoring with SNMP is a bit tricky. The UCD-SNMP-MIB includes OIDs for monitoring disk space usage, but only the root filesystem has a predictable OID for this type of monitoring, so some extra setup is needed for each individual system.
The general configuration for a disk space monitoring chart looks like:
- family: disk_space
id: _
title: /
type: stacked
units: bytes
dimensions:
- {algorithm: absolute, name: avail, oid: 1.3.6.1.4.1.2021.9.1.7.1}
- {algorithm: absolute, name: used, oid: 1.3.6.1.4.1.2021.9.1.8.1}
The above configuration snippet can be used to monitor the root filesystem’s space usage on any system. To get a list of what other filesystems you can monitor, run:
snmpwalk -v 2c -c netdata HOSTNAME 1.3.6.1.4.1.2021.9.1.2
or, if using SNMP v3:
snmpwalk -v 3 -u netdata -l noAuthPriv -a SHA -A PASSWORD HOSTNAME 1.3.6.1.4.1.2021.9.1.2
Those should produce output that looks something like:
UCD-SNMP-MIB::dskPath.1 = STRING: /
UCD-SNMP-MIB::dskPath.2 = STRING: /dev
UCD-SNMP-MIB::dskPath.3 = STRING: /boot/efi
The relevant parts here are the number just before the =
, which is the disk index, and the part at the end of
the line, which indicates the path to the filesystem corresponding to that index. You can replace the final 1
in the OIDs in the configuration above with the desired disk index to monitor that specific disk, thoughy you
should also change the id
and title
for the chart to uniquely identify that specific disk.
Index 1 will always be the root filesystem. The indexes of other filesystems are based on the order they are listed in the kernel’s mount table, so they are dependent on the exact configuration of the system, and they may change as a result of seemingly unrelated changes such as kernel upgrades.
Virtual Nodes
Starting with v1.39.0, Netdata is adding a new feature called ‘virtual nodes’ (or ‘vnodes’). Virtual nodes let you treat a set of metrics as a separate system from the system that Netdata is running on, which is perfect for monitoring remote UNIX systems via SNMP.
You can open the virtual nodes configuration by running:
/etc/netdata/edit-config vnodes/vnodes.conf
The configuration file itself is a YAML document, just like with the SNMP collector configuration.
A simple configuration entry for a single virtual node might look like this:
- hostname: foo.example.com # This defines the hostname that will be shown for the node
guid: 00000000-0000-0000-0000-000000000000 # This defines the node’s GUID. It should be unique among all nodes.
labels: # This defines host labels, which are used to present information about the node in Netdata Cloud
_architecture: x86_64
_os_name: Solaris
_os_version: 11
_system_cores: 8
_system_cpu_freq: 4700000000
_system_disk_space: 1099511627776
_system_ram_total: 34359738368
_virtualization: kvm
Once you have a virtual node defined, you simply need to add a vnode:
key to the SNMP job that you want to
associate with that node, with the value being equal to the hostname of the virtual node.
What about other SNMP implementations?
The configuration outlined above for Netdata will also work with most other SNMP implementations provided they
expose the relevant OIDs. Pretty much anything exposes the HOST-RESOURCES-MIB, though the UCD-SNMP-MIB is much
more vendor-specific. As far as I know, only Net-SNMP supports it out of the box, though FreeBSD does include a
package with an extension for bsnmpd (called bsnmp-ucd
) that implements support for it.