Unbound monitoring with Netdata

What is Unbound?

Unbound is a validating, recursive, and caching DNS resolver product from NLnet Labs.

Monitoring Unbound with Netdata

The prerequisites for monitoring Unbound with Netdata are to have Unbound and Netdata installed on your system.

Netdata auto discovers hundreds of services, and for those it doesn’t turning on manual discovery is a one line configuration. For more information on configuring Netdata for Unbound monitoring please read the collector documentation.

You should now see the Unbound section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.

Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.

What Unbound metrics are important to monitor - and why?

Queries

Queries are the requests sent to the DNS server from a client. It is important to monitor the number of queries sent to the server, and the rate of queries sent by IP address to detect and prevent malicious activity. Normal values for queries should be in the hundreds to thousands depending on the size of the network.

Queries IP Ratelimited

The number of queries that have been ratelimited by IP address. Ratelimiting is used to prevent malicious or excessive queries. Normal values should be close to zero.

DNSCrypt Queries

The number of queries that were crypted, received with a valid certificate, received in cleartext, or received as malformed. DNSCrypt is an encryption protocol used to secure DNS traffic. Normal values depend on the level of usage of DNSCrypt.

Cache

The number of cache hits and misses. Cache hits are requests that can be answered from the cache, while cache misses are requests that do not have a cached answer. Normal values for cache hits should be higher than cache misses.

Cache Percentage

The percentage of cache hits and misses. This metric gives an indication of how effective the caching policy is. Normal values should be higher for cache hits than for cache misses.

Prefetches

The number of prefetches. Prefetches are requests sent to the DNS server in anticipation of future requests. Normal values depend on the network’s usage of prefetching.

Expired

The number of replies that have been expired. Expired replies are requests that have timed out, indicating a problem with the DNS server. Normal values should be close to zero.

Zero TTL Replies

The number of replies with a TTL of zero. TTL stands for Time to Live, and indicates the amount of time a reply can be cached. Replies with a TTL of zero are not cached, and indicate a problem with the DNS server. Normal values should be close to zero.

Recursive Replies

The number of recursive replies. Recursive replies are requests that require the DNS server to look up the answer from other DNS servers. Normal values depend on the network’s usage of recursive requests.

Recursion Time

The average and median time taken to process recursive requests. This metric is used to measure the performance of recursive requests. Normal values should be in the low-to-mid hundreds of milliseconds.

Request List Usage

The average and maximum number of requests in the request list. The request list is used to store requests that are waiting to be processed. Normal values should be in the hundreds.

Current Request List Usage

The total and user-specific number of requests in the request list. This metric is used to measure the amount of requests waiting to be processed. Normal values should be in the hundreds.

Request List Jostle List

The number of requests that have been overwritten or dropped from the request list. This metric is used to measure how well the request list is being managed. Normal values should be close to zero.

TCP Usage

The usage of TCP buffers. TCP buffers are used to store requests that are waiting to be processed. Normal values should be in the low-to-mid hundreds.

Uptime

The uptime of the DNS server. This metric is used to measure the availability of the DNS server. Normal values should be close to 100%.

Thread Cache

The number of thread cache hits and misses. Thread cache hits are requests that can be answered from the cache, while thread cache misses are requests that do not have a cached answer. Normal values for cache hits should be higher than cache misses.

Thread Cache Percentage

The percentage of thread cache hits and misses. This metric gives an indication of how effective the caching policy is. Normal values should be higher for cache hits than for cache misses.

Thread Prefetch

The number of thread prefetches. Thread prefetches are requests sent to the DNS server in anticipation of future requests. Normal values depend on the network’s usage of prefetching.

Thread Expired

The number of thread replies that have been expired. Expired replies are requests that have timed out, indicating a problem with the DNS server. Normal values should be close to zero.

Thread Zero TTL Replies

The number of thread replies with a TTL of zero. TTL stands for Time to Live, and indicates the amount of time a reply can be cached. Replies with a TTL of zero are not cached, and indicate a problem with the DNS server. Normal values should be close to zero.

Thread Recursive Replies

The number of thread recursive replies. Recursive replies are requests that require the DNS server to look up the answer from other DNS servers. Normal values depend on the network’s usage of recursive requests.

Thread Recursion Time

The average and median time taken to process thread recursive requests. This metric is used to measure the performance of recursive requests. Normal values should be in the low-to-mid hundreds of milliseconds.

Thread Request List Usage

The average and maximum number of requests in the thread request list. The request list is used to store requests that are waiting to be processed. Normal values should be in the hundreds.

Thread Current Request List Usage

The total and user-specific number of requests in the thread request list. This metric is used to measure the amount of requests waiting to be processed. Normal values should be in the hundreds.

Thread Request List Jostle List

The number of requests that have been overwritten or dropped from the thread request list. This metric is used to measure how well the request list is being managed. Normal values should be close to zero.

Thread TCP Usage

The usage of thread TCP buffers. TCP buffers are used to store requests that are waiting to be processed. Normal values should be in the low-to-mid hundreds.

Cache Memory

The amount of memory used for caching messages, RRsets, DNSCrypt nonces, and DNSCrypt shared secrets. This metric is used to measure the amount of memory used for caching.

Mod Memory

The amount of memory used for iterators, respips, validators, subnets, and IPsecs. This metric is used to measure the amount of memory used for various modules.

Mem Streamwait

The amount of memory used for streamwaits. Streamwaits are used to process requests in a streaming manner. Normal values depend on the amount of streaming requests being processed.

Cache Count

The number of infra, key, msg, rrset, DNSCrypt nonce, and shared secret items in the cache. This metric is used to measure the amount of cached items.

Type Queries

The number of queries for each query type. This metric is used to measure the usage of different query types.

Class Queries

The number of queries for each query class. This metric is used to measure the usage of different query classes.

Opcode Queries

The number of queries for each query opcode. This metric is used to measure the usage of different query opcodes.

Flag Queries

The number of queries with a query response flag. This metric is used to measure the usage of different query flags.

Rcode Answers

The number of replies for each reply rcode. This metric is used to measure the usage of different reply codes. Normal values depend on the type of replies being sent.

Thread Queries

The number of requests sent to the DNS server from a thread. It is important to monitor the number of requests sent to the server, and the rate of requests sent by IP address to detect and prevent malicious activity. Normal values for requests should be in the hundreds to thousands depending on the size of the network.

Thread Queries IP Ratelimited

The number of requests that have been ratelimited by thread IP address. Ratelimiting is used to prevent malicious or excessive requests. Normal values should be close to zero.

Thread DNScrypt Queries

The number of requests that were crypted, received with a valid certificate, received in cleartext, or received as malformed. DNSCrypt is an encryption protocol used to secure DNS traffic. Normal values depend on the level of usage of DNSCrypt.

The observability platform companies need to succeed

Sign up for free

Want a personalised demo of Netdata for your use case?

Book a Demo