Unbound is a validating, recursive, and caching DNS resolver product from NLnet Labs.
The prerequisites for monitoring Unbound with Netdata are to have Unbound and Netdata installed on your system.
Netdata auto discovers hundreds of services, and for those it doesn’t turning on manual discovery is a one line configuration. For more information on configuring Netdata for Unbound monitoring please read the collector documentation.
You should now see the Unbound section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.
Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.
Queries are the requests sent to the DNS server from a client. It is important to monitor the number of queries sent to the server, and the rate of queries sent by IP address to detect and prevent malicious activity. Normal values for queries should be in the hundreds to thousands depending on the size of the network.
The number of queries that have been ratelimited by IP address. Ratelimiting is used to prevent malicious or excessive queries. Normal values should be close to zero.
The number of queries that were crypted, received with a valid certificate, received in cleartext, or received as malformed. DNSCrypt is an encryption protocol used to secure DNS traffic. Normal values depend on the level of usage of DNSCrypt.
The number of cache hits and misses. Cache hits are requests that can be answered from the cache, while cache misses are requests that do not have a cached answer. Normal values for cache hits should be higher than cache misses.
The percentage of cache hits and misses. This metric gives an indication of how effective the caching policy is. Normal values should be higher for cache hits than for cache misses.
The number of prefetches. Prefetches are requests sent to the DNS server in anticipation of future requests. Normal values depend on the network’s usage of prefetching.
The number of replies that have been expired. Expired replies are requests that have timed out, indicating a problem with the DNS server. Normal values should be close to zero.
The number of replies with a TTL of zero. TTL stands for Time to Live, and indicates the amount of time a reply can be cached. Replies with a TTL of zero are not cached, and indicate a problem with the DNS server. Normal values should be close to zero.
The number of recursive replies. Recursive replies are requests that require the DNS server to look up the answer from other DNS servers. Normal values depend on the network’s usage of recursive requests.
The average and median time taken to process recursive requests. This metric is used to measure the performance of recursive requests. Normal values should be in the low-to-mid hundreds of milliseconds.
The average and maximum number of requests in the request list. The request list is used to store requests that are waiting to be processed. Normal values should be in the hundreds.
The total and user-specific number of requests in the request list. This metric is used to measure the amount of requests waiting to be processed. Normal values should be in the hundreds.
The number of requests that have been overwritten or dropped from the request list. This metric is used to measure how well the request list is being managed. Normal values should be close to zero.
The usage of TCP buffers. TCP buffers are used to store requests that are waiting to be processed. Normal values should be in the low-to-mid hundreds.
The uptime of the DNS server. This metric is used to measure the availability of the DNS server. Normal values should be close to 100%.
The number of thread cache hits and misses. Thread cache hits are requests that can be answered from the cache, while thread cache misses are requests that do not have a cached answer. Normal values for cache hits should be higher than cache misses.
The percentage of thread cache hits and misses. This metric gives an indication of how effective the caching policy is. Normal values should be higher for cache hits than for cache misses.
The number of thread prefetches. Thread prefetches are requests sent to the DNS server in anticipation of future requests. Normal values depend on the network’s usage of prefetching.
The number of thread replies that have been expired. Expired replies are requests that have timed out, indicating a problem with the DNS server. Normal values should be close to zero.
The number of thread replies with a TTL of zero. TTL stands for Time to Live, and indicates the amount of time a reply can be cached. Replies with a TTL of zero are not cached, and indicate a problem with the DNS server. Normal values should be close to zero.
The number of thread recursive replies. Recursive replies are requests that require the DNS server to look up the answer from other DNS servers. Normal values depend on the network’s usage of recursive requests.
The average and median time taken to process thread recursive requests. This metric is used to measure the performance of recursive requests. Normal values should be in the low-to-mid hundreds of milliseconds.
The average and maximum number of requests in the thread request list. The request list is used to store requests that are waiting to be processed. Normal values should be in the hundreds.
The total and user-specific number of requests in the thread request list. This metric is used to measure the amount of requests waiting to be processed. Normal values should be in the hundreds.
The number of requests that have been overwritten or dropped from the thread request list. This metric is used to measure how well the request list is being managed. Normal values should be close to zero.
The usage of thread TCP buffers. TCP buffers are used to store requests that are waiting to be processed. Normal values should be in the low-to-mid hundreds.
The amount of memory used for caching messages, RRsets, DNSCrypt nonces, and DNSCrypt shared secrets. This metric is used to measure the amount of memory used for caching.
The amount of memory used for iterators, respips, validators, subnets, and IPsecs. This metric is used to measure the amount of memory used for various modules.
The amount of memory used for streamwaits. Streamwaits are used to process requests in a streaming manner. Normal values depend on the amount of streaming requests being processed.
The number of infra, key, msg, rrset, DNSCrypt nonce, and shared secret items in the cache. This metric is used to measure the amount of cached items.
The number of queries for each query type. This metric is used to measure the usage of different query types.
The number of queries for each query class. This metric is used to measure the usage of different query classes.
The number of queries for each query opcode. This metric is used to measure the usage of different query opcodes.
The number of queries with a query response flag. This metric is used to measure the usage of different query flags.
The number of replies for each reply rcode. This metric is used to measure the usage of different reply codes. Normal values depend on the type of replies being sent.
The number of requests sent to the DNS server from a thread. It is important to monitor the number of requests sent to the server, and the rate of requests sent by IP address to detect and prevent malicious activity. Normal values for requests should be in the hundreds to thousands depending on the size of the network.
The number of requests that have been ratelimited by thread IP address. Ratelimiting is used to prevent malicious or excessive requests. Normal values should be close to zero.
The number of requests that were crypted, received with a valid certificate, received in cleartext, or received as malformed. DNSCrypt is an encryption protocol used to secure DNS traffic. Normal values depend on the level of usage of DNSCrypt.
Want a personalised demo of Netdata for your use case?