ScaleIO monitoring with Netdata

What is ScaleIO?

ScaleIO is an enterprise-grade storage software solution designed for large-scale storage applications. It provides scalable storage performance and improved resiliency for high availability and data protection. It is a scale-out storage solution that can be deployed on premise or in the cloud.

Monitoring ScaleIO with Netdata

The prerequisites for monitoring ScaleIO with Netdata are to have ScaleIO and Netdata installed on your system.

Netdata auto discovers hundreds of services, and for those it doesn’t turning on manual discovery is a one line configuration. For more information on configuring Netdata for ScaleIO monitoring please read the collector documentation.

You should now see the ScaleIO section on the Overview tab in Netdata Cloud already populated with charts about all the metrics you care about.

Netdata has a public demo space (no login required) where you can explore different monitoring use-cases and get a feel for Netdata.

What ScaleIO metrics are important to monitor - and why?

System

System Capacity Total

System capacity total is a metric that provides the total capacity of a system in KiB. This metric is important to monitor because it provides visibility into how much storage a system has available for use. This can prevent users from running out of storage when it is needed.

System Capacity In Use

System capacity in use is a metric that provides the total amount of capacity in use in KiB. This metric is important to monitor because it helps to identify if the system is being over-utilized or under-utilized. If the system is over-utilized, this could lead to performance issues or outages.

System Capacity Usage

System capacity usage is a metric that provides the usage of different types of capacity (thick, decreased, thin, snapshot, spare, unused) in KiB. This metric is important to monitor because it helps to identify which types of capacity are being used and which types are not. This can help to optimize the system for maximum performance.

System Capacity Available Volume Allocation

System capacity available volume allocation is a metric that provides the total available capacity for volumes in KiB. This metric is important to monitor because it helps to identify how much storage is available for new volumes. If the system is running low on capacity, this can cause performance issues or outages.

System Capacity Health State

System capacity health state is a metric that provides the health state of the system in terms of capacity. This metric is important to monitor because it helps to identify any potential issues with the system that could lead to performance issues or outages. The health state can be one of the following: protected, degraded, in_maintenance, failed, or unavailable.

System Workload Primary Bandwidth Total

System workload primary bandwidth total is a metric that provides the total primary bandwidth in KiB/s. This metric is important to monitor because it helps to identify the amount of bandwidth that is being used by the system. This can be used to identify potential performance issues or outages.

System Workload Primary Bandwidth

System workload primary bandwidth is a metric that provides the read and write primary bandwidth in KiB/s. This metric is important to monitor because it helps to identify the types of workloads that are being executed on the system. This can be used to identify potential performance issues or outages.

System Workload Primary IOPS Total

System workload primary IOPS total is a metric that provides the total primary IOPS in iops/s. This metric is important to monitor because it helps to identify the amount of IOPS that is being used by the system. This can be used to identify potential performance issues or outages.

System Workload Primary IOPS

System workload primary IOPS is a metric that provides the read and write primary IOPS in iops/s. This metric is important to monitor because it helps to identify the types of workloads that are being executed on the system. This can be used to identify potential performance issues or outages.

System Workload Primary IO Size Total

System Workload Primary IO Size Total is the total size of all primary I/O operations that are taking place in the system. This metric is important to monitor, as it can provide insight into how efficiently the system is running. If the size of the I/O operations is too small, it could indicate that the system is not being used efficiently and could possibly lead to performance issues. Conversely, if the size of the I/O operations is too large, it could indicate that the system is overworked and could lead to latency or other performance issues.

System Rebalance

System Rebalance is the rate at which data is moved between storage components in order to maintain performance and data consistency. This metric should be monitored in order to ensure that the system is running efficiently and that data is being moved at the optimal rate. Monitoring System Rebalance can help to identify potential data consistency issues, and can also help to identify when the system is overloaded and needs to be optimized.

System Rebalance Left

System Rebalance Left is the amount of data left to be moved between storage components in order to maintain performance and data consistency. This metric should be monitored in order to ensure that the system is running efficiently and that the rebalancing process is progressing at an acceptable rate. Monitoring System Rebalance Left can help to identify when the system is overloaded or is taking too long to complete the rebalancing process.

System Rebalance Time Until Finish

System Rebalance Time Until Finish is the estimated time it will take for the rebalancing process to complete. This metric should be monitored in order to ensure that the system is running efficiently and that the rebalancing process is progressing at an acceptable rate. Monitoring System Rebalance Time Until Finish can help to identify when the system is taking too long to complete the rebalancing process, and can also help to identify potential issues with the system.

System Rebuild

System Rebuild is the rate at which data is moved between storage components in order to repair any data integrity issues. This metric should be monitored in order to ensure that the system is running efficiently and that data is being moved at the optimal rate. Monitoring System Rebuild can help to identify potential data integrity issues, and can also help to identify when the system is overloaded and needs to be optimized.

System Rebuild Left

System Rebuild Left is the amount of data left to be moved between storage components in order to repair any data integrity issues. This metric should be monitored in order to ensure that the system is running efficiently and that the rebuilding process is progressing at an acceptable rate. Monitoring System Rebuild Left can help to identify when the system is taking too long to complete the rebuilding process, and can also help to identify potential issues with the system.

System Defined Components

System Defined Components is a metric that tracks the number of components that have been defined in the system. This metric is important to monitor, as it can provide insight into the system components that are being used and whether or not they are being used optimally. Monitoring System Defined Components can help to identify when the system is over-provisioned or under-provisioned, and can also help to identify potential issues with the system.

System Components Volumes By Type

System Components Volumes By Type is a metric that tracks the number of volumes of each type that have been defined in the system. This metric is important to monitor, as it can provide insight into the types of volumes that are being used and whether or not they are being used optimally. Monitoring System Components Volumes By Type can help to identify when the system is over-provisioned or under-provisioned, and can also help to identify potential issues with the system.

System Components Volumes By Mapping

System Components Volumes By Mapping is a metric that tracks the number of volumes that have been mapped and unmapped in the system. This metric is important to monitor, as it can provide insight into the types of volumes that are being used and whether or not they are being used optimally. Monitoring System Components Volumes By Mapping can help to identify when the system is over-provisioned or under-provisioned, and can also help to identify potential issues with the system.

Storage Pool

Storage Pool Capacity Total

Storage Pool Capacity Total is the total capacity of the storage pool. This metric can be used to monitor the total storage space available in the pool, and it is important to monitor this metric in order to understand the total amount of storage available for use. If the total capacity is reaching its limit, then it may be necessary to add additional storage to the pool.

Storage Pool Capacity In Use

Storage Pool Capacity In Use is the amount of capacity that is currently being used within the storage pool. This metric can be used to monitor the amount of storage capacity that is currently being used, and it is important to monitor this metric in order to ensure that enough storage is available for new applications or additional data.

Storage Pool Capacity Usage

Storage Pool Capacity Usage is the amount of capacity that is being used for various types of data, such as thick, decreased, thin, snapshot, spare, and unused. This metric can be used to monitor the usage of each type of data within the storage pool, and it is important to monitor this metric in order to ensure that the different types of data are being used appropriately. For example, if the amount of thick data is increasing, it may be necessary to add additional storage to the pool.

Storage Pool Capacity Utilization

Storage Pool Capacity Utilization is the percentage of available capacity that is being used in the storage pool. This metric can be used to monitor the utilization of the storage pool, and it is important to monitor this metric in order to understand how efficiently the storage is being used and if additional storage is needed.

Storage Pool Capacity Available Volume Allocation

Storage Pool Capacity Available Volume Allocation is the amount of available capacity that can be allocated to new volumes. This metric can be used to monitor the amount of available capacity that can be used for new volumes, and it is important to monitor this metric in order to ensure that there is enough capacity available for additional volumes.

Storage Pool Capacity Health State

Storage Pool Capacity Health State is the health of the storage pool. This metric can be used to monitor the health of the storage pool, and it is important to monitor this metric in order to ensure that the storage pool is operating in a healthy state. If the health state is not healthy, then it may be necessary to take corrective action in order to prevent any issues from arising.

Storage Pool Components

Storage Pool Components is the number of components (such as devices, snapshots, volumes, and vTrees) that are being used in the storage pool. This metric can be used to monitor the number of components that are being used in the storage pool, and it is important to monitor this metric in order to ensure that the storage pool is not overburdened with too many components. Monitoring this metric can help identify any potential issues with the storage pool, such as insufficient storage or overloaded components.

SDC

SDC MDM Connection State

The connection state of the Storage Data Controller (SDC) to the Management Data Manager (MDM) in the Dell EMC ScaleIO environment. This metric can be used to prevent issues such as SDCs becoming disconnected from the MDM and to ensure that all SDCs are connected and communicating properly. If the connection state is false, it indicates that the SDC is not connected to the MDM and should be addressed as soon as possible.

SDC Bandwidth

The amount of data being read and written from the Storage Data Controller (SDC) in KiB/s. It is important to monitor this metric as it can indicate issues such as high latency or throughput bottlenecks. If the read/write bandwidth is above what is expected, it could be an indication that the system is struggling to meet demands and should be addressed.

SDC IOPS

The number of input/output operations per second (IOPS) that the Storage Data Controller (SDC) is performing. This metric can be used to monitor the performance of the SDC and can help identify issues such as high latency or throughput bottlenecks. If the IOPS are too high, it could indicate that the SDC is struggling to meet demands and should be addressed.

SDC IO Size

The size of the input/output operations that the Storage Data Controller (SDC) is performing in KiB. This metric can be used to monitor the performance of the SDC and can help identify issues such as high latency or throughput bottlenecks. If the IO size is too high, it could indicate that the SDC is struggling to meet demands and should be addressed.

SDC Num of Mapped Volumes

The number of volumes that are currently mapped to the Storage Data Controller (SDC). This metric can be used to monitor the number of volumes that are currently mapped to the SDC and can help identify issues such as incorrect configurations or a lack of available resources. If the number of mapped volumes is below or above what is expected, it could indicate a potential issue that should be addressed.

Get Netdata

Sign up for free

Want to see a demonstration of Netdata for multiple use cases?

Go to Live Demo