CouchDB monitoring with Netdata

What is CouchDB?

CouchDB is an open source NoSQL database designed for distributed systems. It has a simple, easy-to-understand data model and is highly scalable. CouchDB can be used to store and query JSON documents, and also supports advanced features such as full-text search and GeoSpatial queries.

Monitoring CouchDB with Netdata

The prerequisites for monitoring CouchDB with Netdata are to have CouchDB and Netdata installed on your system.

Netdata auto discovers hundreds of services, and for those it doesn’t turning on manual discovery is a one line configuration. For more information on configuring Netdata for CouchDB monitoring please read the collector documentation.

You should now see the CouchDB section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.

Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.

What CouchDB metrics are important to monitor - and why?

Activity

The Activity metric measures the number of reads, writes, and views that are performed on a CouchDB instance. This metric is important to monitor because it gives an indication of how much the database is being used and can inform decisions on scaling and optimization. Knowing the activity levels can help identify potential performance issues and help ensure that the database is running efficiently. Generally, the higher the number of reads and writes, the more important it is to monitor this metric. Additionally, if the number of view reads is consistently high, it can indicate that there is an inefficient query that should be optimized.

Request Methods

The Request Methods metric measures the number of requests made to a CouchDB instance for each type of request. This metric is important to monitor because it can help identify potential performance issues and help inform decisions on scaling and optimization. Knowing the type and number of requests being made can help identify which components of the database are being used the most and can help identify potential bottlenecks. Additionally, this metric can help inform decisions on what type of caching solutions might be appropriate to improve performance. Generally, the higher the number of requests, the more important it is to monitor this metric.

Response Codes

These are HTTP status codes that indicate the success or failure of an API request. Monitoring response codes is important to ensure API requests are being processed correctly, as any unexpected response codes may indicate an issue in the API or the system.

For example, a 200 response code indicates a successful API request, while a 500 response code indicates an internal server error which could be caused by a number of factors. Monitoring response codes can help identify issues in the API or system before they become serious. It may also be helpful to set up alerting for unexpected response codes so any issues can be addressed quickly.

Response Code Classes (2xx, 3xx, 4xx, 5xx)

Response code classes are the range of HTTP status codes returned by a web server in response to a client’s request. The first digit of the status code defines the class of response.

The 2xx class of status codes indicate a successful response, including 200 OK, 201 Created, and 202 Accepted.

The 3xx class of status codes indicate a redirection, such as 301 Moved Permanently, 302 Found, and 303 See Other.

The 4xx class of status codes indicate a client error, such as 400 Bad Request, 401 Unauthorized, and 404 Not Found.

The 5xx class of status codes indicate a server error, such as 500 Internal Server Error, 502 Bad Gateway, and 503 Service Unavailable.

Monitoring response code classes is important to ensure that requests are handled appropriately by the server. For example, if the server is returning a 4xx response code, it may indicate that a client is attempting to access a page which does not exist or is not permitted. By monitoring response code classes, you can spot potential issues that could be causing clients to receive errors, and take steps to rectify them.

Active Tasks (indexer, db_compaction, replication, view_compaction)

Active Tasks is a metric that shows how many tasks CouchDB is currently actively running. This includes indexer, db_compaction, replication, and view_compaction tasks. This metric can be used to keep track of the amount of work CouchDB is doing at any given time and can help to identify bottlenecks or potential performance issues. If the number of active tasks is consistently very high, it could indicate a problem with the system or with the way the data is being stored. By monitoring this metric, potential issues can be identified early and addressed before they become more serious. Additionally, it can provide insight into how the system is being used and can help to optimize performance. Normal value ranges for this metric will depend on the size of the database and the types of tasks being performed.

Replicator Jobs (running, pending, crashed, internal_replication_jobs)

Replicator Jobs is a metric that shows the total number of replicator jobs that CouchDB is running. This includes running, pending, crashed, and internal_replication_jobs. This metric can be used to monitor the progress of database replication and to ensure that data is being replicated as expected. If the number of running replicator jobs is consistently very high, it could indicate an issue with the replication process or the database itself. By monitoring this metric, potential issues can be identified early and addressed before they become more serious. Additionally, it can provide insight into how the system is being used and can help to optimize replication performance. Normal value ranges for this metric will depend on the size of the database and the types of tasks being performed.

Open Files

Open Files is a metric that helps identify the number of files opened by CouchDB. This metric is important to monitor as it gives an indication of how much load CouchDB is under and how it is handling the workload. If the number of open files is too high, it can indicate that there is too much load on CouchDB and that it is struggling to deal with the request load. If the number of open files is too low, it can indicate that CouchDB is not being utilized fully and that there is potential for optimization.

Normal value ranges for this metric will vary depending on the size of the database and the number of requests being handled. However, in general, the number of open files should remain relatively consistent and should not exceed a certain threshold (this threshold will depend on the size of the database and the number of requests being handled). If this value exceeds the threshold it can be an indication of an issue that needs to be addressed.

Erlang VM Memory (atom, binaries, code, ets, procs, other)

Erlang VM Memory represents the memory used by the Erlang Virtual Machine (VM) to manage the CouchDB server process. This metric is important to monitor to ensure that CouchDB is not overusing memory resources, which could lead to poor performance or even a crash. Memory usage should be monitored in various categories, specifically atom, binaries, code, ets, procs, and other. Atom refers to the memory used to store atoms, which are unique strings used by the Erlang VM to store information. Binaries refers to the memory used to store binary data such as images and documents. Code refers to the memory used to store code, including functions and modules. Ets refers to the memory used to store Erlang Term Storage, which is used to store and retrieve data quickly. Procs refers to the memory used to store process information. Other refers to the miscellaneous memory used by the Erlang VM.

Proc Counts (os_procs, erl_procs)

Proc Counts represent the number of processes running on the CouchDB server. This metric is important to monitor because it can give an indication of the overall load on the server, which can help identify any potential performance issues. Proc Counts are divided into two categories: os_procs and erl_procs. Os_procs refers to the number of operating system processes running on the server, while erl_procs refers to the number of Erlang processes running on the server.

Peak Message Queue Size

Peak Message Queue Size refers to the maximum size of the message queue in the Erlang VM. This metric is important to monitor because it can provide insight into the performance of the Erlang VM, which is responsible for managing the CouchDB server process. A high peak message queue size can indicate that the Erlang VM is overloaded and may be unable to process requests quickly.

Reductions

Reductions refer to the number of reductions performed by the Erlang VM. This metric is important to monitor because it can provide insight into the performance of the Erlang VM and the overall load on the server. A high number of reductions can indicate that the Erlang VM is overloaded and may be unable to process requests quickly. typically be able to manage requests quickly.

Database specific metrics

Get Netdata

Sign up for free

Want to see a demonstration of Netdata for multiple use cases?

Go to Live Demo