Ceph icon

Ceph

Ceph

Plugin: go.d.plugin Module: ceph

Overview

This collector monitors the overall health status and performance of your Ceph clusters. It gathers key metrics for the entire cluster, individual Pools, and OSDs.

It collects metrics by periodically issuing HTTP GET requests to the Ceph Manager RESP API:

This collector is only supported on the following platforms:

  • Linux

This collector supports collecting metrics from multiple instances of this integration, including remote instances.

Default Behavior

Auto-Detection

The collector can automatically detect Ceph Manager instances running on:

  • localhost that are listening on port 8443
  • within Docker containers

Note that the Ceph RESP API requires a username and password. While Netdata can automatically detect Ceph Manager instances and create data collection jobs, these jobs will fail unless you provide the necessary credentials.

Limits

The default configuration for this integration does not impose any limits on data collection.

Performance Impact

The default configuration for this integration is not expected to impose a significant performance impact on the system.

Setup

Prerequisites

No action required.

Configuration

File

The configuration file name for this integration is go.d/ceph.conf.

You can edit the configuration file using the edit-config script from the Netdata config directory.

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config go.d/ceph.conf

Options

The following options can be defined globally: update_every.

Name Description Default Required
update_every Data collection frequency. 1 no
autodetection_retry Recheck interval in seconds. Zero means no recheck will be scheduled. 0 no
url The URL of the Ceph Manager API. https://127.0.0.1:8443 yes
timeout HTTP request timeout. 2 no
username Username for basic HTTP authentication. yes
password Password for basic HTTP authentication. yes
proxy_url Proxy URL. no
proxy_username Username for proxy basic HTTP authentication. no
proxy_password Password for proxy basic HTTP authentication. no
method HTTP request method. GET no
body HTTP request body. no
headers HTTP request headers. no
not_follow_redirects Redirect handling policy. Controls whether the client follows redirects. no no
tls_skip_verify Server certificate chain and hostname validation policy. Controls whether the client performs this check. yes no
tls_ca Certification authority that the client uses when verifying the server’s certificates. no
tls_cert Client TLS certificate. no
tls_key Client TLS key. no

Examples

Basic

A basic example configuration.

jobs:
  - name: local
    url: https://127.0.0.1:8443
    username: user
    password: pass

Multi-instance

Note: When you define multiple jobs, their names must be unique.

Collecting metrics from local and remote instances.

jobs:
  - name: local
    url: https://127.0.0.1:8443
    username: user
    password: pass

  - name: remote
    url: https://192.0.2.1:8443
    username: user
    password: pass

Metrics

Metrics grouped by scope.

The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.

Per cluster

These metrics refer to the entire Ceph cluster.

Labels:

Label Description
fsid A unique identifier of the cluster.

Metrics:

Metric Dimensions Unit
ceph.cluster_status ok, err, warn status
ceph.cluster_hosts_count hosts hosts
ceph.cluster_monitors_count monitors monitors
ceph.cluster_osds_count osds osds
ceph.cluster_osds_by_status_count up, down, in, out status
ceph.cluster_managers_count active, standby managers
ceph.cluster_object_gateways_count object gateways
ceph.cluster_iscsi_gateways_count iscsi gateways
ceph.cluster_iscsi_gateways_by_status_count up, down gateways
ceph.cluster_physical_capacity_utilization utilization percent
ceph.cluster_physical_capacity_usage avail, used bytes
ceph.cluster_objects_count objects objects
ceph.cluster_objects_by_status_distribution healthy, misplaced, degraded, unfound percent
ceph.cluster_pools_count pools pools
ceph.cluster_pgs_count pgs pgs
ceph.cluster_pgs_by_status_count clean, working, warning, unknown pgs
ceph.cluster_pgs_per_osd_count per_osd pgs

Per osd

These metrics refer to the Object Storage Daemon (OSD).

Labels:

Label Description
fsid A unique identifier of the cluster.
osd_uuid OSD UUID.
osd_name OSD name.
device_class OSD device class.

Metrics:

Metric Dimensions Unit
ceph.osd_status up, down, in, out status
ceph.osd_space_usage avail, used bytes
ceph.osd_io read, written bytes/s
ceph.osd_iops read, write ops/s
ceph.osd_latency commit, apply milliseconds

Per pool

These metrics refer to the Pool.

Labels:

Label Description
fsid A unique identifier of the cluster.
pool_name Pool name.

Metrics:

Metric Dimensions Unit
ceph.pool_space_utilization utilization percent
ceph.pool_space_usage avail, used bytes
ceph.pool_objects_count object objects
ceph.pool_io read, written bytes/s
ceph.pool_iops read, write ops/s

Alerts

The following alerts are available:

Alert name On metric Description
ceph_cluster_physical_capacity_utilization ceph.cluster_physical_capacity_utilization Ceph cluster ${label:fsid} disk space utilization

Troubleshooting

Debug Mode

Important: Debug mode is not supported for data collection jobs created via the UI using the Dyncfg feature.

To troubleshoot issues with the ceph collector, run the go.d.plugin with the debug option enabled. The output should give you clues as to why the collector isn’t working.

  • Navigate to the plugins.d directory, usually at /usr/libexec/netdata/plugins.d/. If that’s not the case on your system, open netdata.conf and look for the plugins setting under [directories].

    cd /usr/libexec/netdata/plugins.d/
    
  • Switch to the netdata user.

    sudo -u netdata -s
    
  • Run the go.d.plugin to debug the collector:

    ./go.d.plugin -d -m ceph
    

Getting Logs

If you’re encountering problems with the ceph collector, follow these steps to retrieve logs and identify potential issues:

  • Run the command specific to your system (systemd, non-systemd, or Docker container).
  • Examine the output for any warnings or error messages that might indicate issues. These messages should provide clues about the root cause of the problem.

System with systemd

Use the following command to view logs generated since the last Netdata service restart:

journalctl _SYSTEMD_INVOCATION_ID="$(systemctl show --value --property=InvocationID netdata)" --namespace=netdata --grep ceph

System without systemd

Locate the collector log file, typically at /var/log/netdata/collector.log, and use grep to filter for collector’s name:

grep ceph /var/log/netdata/collector.log

Note: This method shows logs from all restarts. Focus on the latest entries for troubleshooting current issues.

Docker Container

If your Netdata runs in a Docker container named “netdata” (replace if different), use this command:

docker logs netdata 2>&1 | grep ceph

The observability platform companies need to succeed

Sign up for free

Want a personalised demo of Netdata for your use case?

Book a Demo