K8s Kubelet monitoring with Netdata

What is K8s Kubelet?

Kubelet is an agent that runs on each node in the cluster. It makes sure that containers are running in a pod.

Monitoring K8s Kubelet with Netdata

The prerequisite for monitoring K8s Kubelet with Netdata is to Netdata installed on your system.

Netdata auto discovers hundreds of services, and for those it doesn’t turning on manual discovery is a one line configuration. For more information on configuring Netdata for K8s Kubelet monitoring please read the collector documentation.

You should now see the K8s Kubelet section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.

Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.

What K8s Kubelet metrics are important to monitor - and why?

API Server Audit Requests Rejected

The rate at which requests are being rejected by Kubernetes' API Server. This metric can help identify potential issues with authentication, authorization, or other security-related issues. By monitoring this metric and other related metrics (e.g. API Server Storage Data Key Generation Failures), it is possible to detect requests that are rejected due to security-related issues and take corrective action.

API Server Storage Data Key Generation Failures

The rate of API Server Storage Data Key Generation Failures indicates how often Kubernetes is unable to generate a valid key for data storage. This can indicate issues with the API Server, such as an inability to access the necessary resources, or issues with the underlying network/storage infrastructure. By monitoring this metric, it is possible to identify issues before they cause a service outage.

API Server Storage Data Key Generation Latencies

The API Server Storage Data Key Generation Latencies metric gives the latency (in microseconds) for generating a key for data storage. This metric can help identify potential performance issues or bottlenecks in the storage layer. By monitoring this metric, it is possible to identify potential performance issues before they cause a service outage.

API Server Storage Envelope Transformation Cache Misses

The rate of API Server Storage Envelope Transformation Cache Misses indicates how often Kubernetes is unable to access the cached transformation data. This can indicate issues with the underlying storage infrastructure or with Kubernetes itself. By monitoring this metric, it is possible to identify issues before they cause a service outage.

Kubelet Containers Running

The Kubelet Containers Running metric indicates the number of containers running on each node in the cluster. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet Pods Running

The Kubelet Pods Running metric indicates the number of pods running on each node in the cluster. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet Pods Log Filesystem Used Bytes

The Kubelet Pods Log Filesystem Used Bytes metric indicates the amount of disk space used by log files for each pod on each node in the cluster. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet Runtime Operations

The Kubelet Runtime Operations metric indicates the rate at which the Kubernetes runtime is performing different types of operations. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet Runtime Operations Errors

The Kubelet Runtime Operations Errors metric indicates the rate at which errors are encountered while performing different types of operations on the Kubernetes runtime. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet Docker Operations

The Kubelet Docker Operations metric indicates the rate at which Docker operations are performed on the Kubernetes runtime. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet Docker Operations Errors

The Kubelet Docker Operations Errors metric indicates the rate at which errors are encountered while performing Docker operations on the Kubernetes runtime. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet Node Config Error

The Kubelet Node Config Error metric indicates whether a node is experiencing an error. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet PLEG Relist Interval Microseconds

The Kubelet PLEG Relist Interval Microseconds metric indicates the interval (in microseconds) between PLEG relists. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet PLEG Relist Latency Microseconds

The Kubelet PLEG Relist Latency Microseconds metric indicates the latency (in microseconds) of PLEG relists. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Kubelet Token Requests

The Kubelet Token Requests metric indicates the rate at which token requests are made to the Kubernetes API Server. This metric can help identify potential issues with authentication, authorization, or other security-related issues. By monitoring this metric and other related metrics (e.g. API Server Storage Data Key Generation Failures), it is possible to detect requests that are rejected due to security-related issues and take corrective action.

REST Client Requests By Code

The REST Client Requests By Code metric indicates the rate at which different HTTP status codes are returned when making requests to the Kubernetes API Server. This metric can help identify potential issues with authentication, authorization, or other security-related issues. By monitoring this metric and other related metrics (e.g. API Server Storage Data Key Generation Failures), it is possible to detect requests that are rejected due to security-related issues and take corrective action.

REST Client Requests By Method

The REST Client Requests By Method metric indicates the rate at which different HTTP methods are used when making requests to the Kubernetes API Server. This metric can help identify potential issues with authentication, authorization, or other security-related issues. By monitoring this metric and other related metrics (e.g. API Server Storage Data Key Generation Failures), it is possible to detect requests that are rejected due to security-related issues and take corrective action.

Volume Manager Total Volumes

The Volume Manager Total Volumes metric indicates the number of volumes that are currently in use or desired for the Kubernetes cluster. This metric can help identify potential issues with resource utilization or misconfigurations with the Kubernetes cluster. By monitoring this metric, it is possible to identify potential issues before they cause a service outage.

Get Netdata

Sign up for free

Want to see a demonstration of Netdata for multiple use cases?

Go to Live Demo