Consul

Plugin: go.d.plugin Module: consul

Overview

This collector monitors key metrics of Consul Agents: transaction timings, leadership changes, memory usage and more.

It periodically sends HTTP requests to Consul REST API.

Used endpoints:

This collector is supported on all platforms.

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Default Behavior

Auto-Detection

This collector discovers instances running on the local host, that provide metrics on port 8500.

On startup, it tries to collect metrics from:

http://localhost:8500
http://127.0.0.1:8500

Limits

The default configuration for this integration does not impose any limits on data collection.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Setup

Prerequisites

Enable Prometheus telemetry

Enable telemetry on your Consul Agent, by increasing the value of prometheus_retention_time from 0.

Add required ACLs to Token

Required only if authentication is enabled.

ACL	Endpoint
`operator:read`	autopilot health status
`node:read`	checks
`agent:read`	configuration, metrics, and lan coordinates

Configuration

File

The configuration file name for this integration is go.d/consul.conf.

The file format is YAML. Generally, the structure is:

update_every: 1
autodetection_retry: 0
jobs:
  - name: some_name1
  - name: some_name1

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/consul.conf

Options

The following options can be defined globally: update_every, autodetection_retry.

Name	Description	Default	Required
update_every	Data collection frequency.	1	no
autodetection_retry	Recheck interval in seconds. Zero means no recheck will be scheduled.	0	no
url	Server URL.	http://localhost:8500	yes
acl_token	ACL token used in every request.		no
max_checks	Checks processing/charting limit.		no
max_filter	Checks processing/charting filter. Uses simple patterns.		no
username	Username for basic HTTP authentication.		no
password	Password for basic HTTP authentication.		no
proxy_url	Proxy URL.		no
proxy_username	Username for proxy basic HTTP authentication.		no
proxy_password	Password for proxy basic HTTP authentication.		no
timeout	HTTP request timeout.	1	no
method	HTTP request method.	GET	no
body	HTTP request body.		no
headers	HTTP request headers.		no
not_follow_redirects	Redirect handling policy. Controls whether the client follows redirects.	no	no
tls_skip_verify	Server certificate chain and hostname validation policy. Controls whether the client performs this check.	no	no
tls_ca	Certification authority that the client uses when verifying the server’s certificates.		no
tls_cert	Client tls certificate.		no
tls_key	Client tls key.		no

Examples

Basic

An example configuration.

jobs:
  - name: local
    url: http://127.0.0.1:8500
    acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"

Basic HTTP auth

Local server with basic HTTP authentication.

jobs:
  - name: local
    url: http://127.0.0.1:8500
    acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"
    username: foo
    password: bar

Multi-instance

Note: When you define multiple jobs, their names must be unique.

Collecting metrics from local and remote instances.

jobs:
  - name: local
    url: http://127.0.0.1:8500
    acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"

  - name: remote
    url: http://203.0.113.10:8500
    acl_token: "ada7f751-f654-8872-7f93-498e799158b6"

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

The set of metrics depends on the Consul Agent mode.

Per Consul instance

These metrics refer to the entire monitored application.

This scope has no labels.

Metrics:

Metric	Dimensions	Unit	Leader	Follower	Client
consul.client_rpc_requests_rate	rpc	requests/s	•	•	•
consul.client_rpc_requests_exceeded_rate	exceeded	requests/s	•	•	•
consul.client_rpc_requests_failed_rate	failed	requests/s	•	•	•
consul.memory_allocated	allocated	bytes	•	•	•
consul.memory_sys	sys	bytes	•	•	•
consul.gc_pause_time	gc_pause	seconds	•	•	•
consul.kvs_apply_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•	•
consul.kvs_apply_operations_rate	kvs_apply	ops/s	•	•
consul.txn_apply_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•	•
consul.txn_apply_operations_rate	txn_apply	ops/s	•	•
consul.autopilot_health_status	healthy, unhealthy	status	•	•
consul.autopilot_failure_tolerance	failure_tolerance	servers	•	•
consul.autopilot_server_health_status	healthy, unhealthy	status	•	•
consul.autopilot_server_stable_time	stable	seconds	•	•
consul.autopilot_server_serf_status	active, failed, left, none	status	•	•
consul.autopilot_server_voter_status	voter, not_voter	status	•	•
consul.network_lan_rtt	min, max, avg	ms	•	•
consul.raft_commit_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•
consul.raft_commits_rate	commits	commits/s	•
consul.raft_leader_last_contact_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•
consul.raft_leader_oldest_log_age	oldest_log_age	seconds	•
consul.raft_follower_last_contact_leader_time	leader_last_contact	ms		•
consul.raft_rpc_install_snapshot_time	quantile_0.5, quantile_0.9, quantile_0.99	ms		•
consul.raft_leader_elections_rate	leader	elections/s	•	•
consul.raft_leadership_transitions_rate	leadership	transitions/s	•	•
consul.server_leadership_status	leader, not_leader	status	•	•
consul.raft_thread_main_saturation_perc	quantile_0.5, quantile_0.9, quantile_0.99	percentage	•	•
consul.raft_thread_fsm_saturation_perc	quantile_0.5, quantile_0.9, quantile_0.99	percentage	•	•
consul.raft_fsm_last_restore_duration	last_restore_duration	ms	•	•
consul.raft_boltdb_freelist_bytes	freelist	bytes	•	•
consul.raft_boltdb_logs_per_batch_rate	written	logs/s	•	•
consul.raft_boltdb_store_logs_time	quantile_0.5, quantile_0.9, quantile_0.99	ms	•	•
consul.license_expiration_time	license_expiration	seconds	•	•	•

Per node check

Metrics about checks on Node level.

Labels:

Label	Description
datacenter	Datacenter Identifier
node_name	The node’s name
check_name	The check’s name

Metrics:

Metric	Dimensions	Unit	Leader	Follower	Client
consul.node_health_check_status	passing, maintenance, warning, critical	status	•	•	•

Per service check

Metrics about checks at a Service level.

Labels:

Label	Description
datacenter	Datacenter Identifier
node_name	The node’s name
check_name	The check’s name
service_name	The service’s name

Metrics:

Metric	Dimensions	Unit	Leader	Follower	Client
consul.service_health_check_status	passing, maintenance, warning, critical	status	•	•	•

Alerts

The following alerts are available:

Alert name	On metric	Description
consul_node_health_check_status	consul.node_health_check_status	node health check ${label:check_name} has failed on server ${label:node_name} datacenter ${label:datacenter}
consul_service_health_check_status	consul.service_health_check_status	service health check ${label:check_name} for service ${label:service_name} has failed on server ${label:node_name} datacenter ${label:datacenter}
consul_client_rpc_requests_exceeded	consul.client_rpc_requests_exceeded_rate	number of rate-limited RPC requests made by server ${label:node_name} datacenter ${label:datacenter}
consul_client_rpc_requests_failed	consul.client_rpc_requests_failed_rate	number of failed RPC requests made by server ${label:node_name} datacenter ${label:datacenter}
consul_gc_pause_time	consul.gc_pause_time	time spent in stop-the-world garbage collection pauses on server ${label:node_name} datacenter ${label:datacenter}
consul_autopilot_health_status	consul.autopilot_health_status	datacenter ${label:datacenter} cluster is unhealthy as reported by server ${label:node_name}
consul_autopilot_server_health_status	consul.autopilot_server_health_status	server ${label:node_name} from datacenter ${label:datacenter} is unhealthy
consul_raft_leader_last_contact_time	consul.raft_leader_last_contact_time	median time elapsed since leader server ${label:node_name} datacenter ${label:datacenter} was last able to contact the follower nodes
consul_raft_leadership_transitions	consul.raft_leadership_transitions_rate	there has been a leadership change and server ${label:node_name} datacenter ${label:datacenter} has become the leader
consul_raft_thread_main_saturation	consul.raft_thread_main_saturation_perc	average saturation of the main Raft goroutine on server ${label:node_name} datacenter ${label:datacenter}
consul_raft_thread_fsm_saturation	consul.raft_thread_fsm_saturation_perc	average saturation of the FSM Raft goroutine on server ${label:node_name} datacenter ${label:datacenter}
consul_license_expiration_time	consul.license_expiration_time	Consul Enterprise licence expiration time on node ${label:node_name} datacenter ${label:datacenter}

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the consul collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn’t working.

Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that’s not the case on your system, open netdata.conf and look for the plugins setting under [directories].
```
cd /usr/libexec/netdata/plugins.d/
```
Switch to the netdata user.
```
sudo -u netdata -s
```

Run the go.d.plugin to debug the collector:

./go.d.plugin -d -m consul

To debug a specific job:

./go.d.plugin -d -m consul -j jobName

Getting Logs

If you’re encountering problems with the consul collector, follow these steps to retrieve logs and identify potential issues:

Run the command specific to your system (systemd, non-systemd, or Docker container).
Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep consul

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector’s name:

grep consul /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named “netdata” (replace if different), use this command:

docker logs netdata 2>&1 | grep consul

Industry

Technology

Use cases

Consul

Consul

Overview

Default Behavior

Auto-Detection

Limits

Performance Impact

Setup

Prerequisites

Enable Prometheus telemetry

Add required ACLs to Token

Configuration

File

Options

Examples

Basic

Basic HTTP auth

Multi-instance

Metrics

Per Consul instance

Per node check

Per service check

Alerts

Troubleshooting

Debug Mode

Getting Logs

System with systemd

System without systemd

Docker Container

The observability platform companies need to succeed