Plugin: proc.plugin Module: /sys/devices/system/edac/mc
The Error Detection and Correction (EDAC) subsystem is detecting and reporting errors in the system’s memory, primarily ECC (Error-Correcting Code) memory errors.
The collector provides data for:
Per memory controller (MC): correctable and uncorrectable errors. These can be of 2 kinds:
Per memory DIMM: correctable and uncorrectable errors. There are 2 kinds:
This collector is supported on all platforms.
This collector supports collecting metrics from multiple instances of this integration, including remote instances.
This integration doesn’t support auto-detection.
The default configuration for this integration does not impose any limits on data collection.
The default configuration for this integration is not expected to impose a significant performance impact on the system.
No action required.
There is no configuration file.
There are no configuration options.
There are no configuration examples.
Metrics grouped by scope.
The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
These metrics refer to the memory controller.
Labels:
Label | Description |
---|---|
controller | mcX directory name of this memory controller. |
mc_name | Memory controller type. |
size_mb | The amount of memory in megabytes that this memory controller manages. |
max_location | Last available memory slot in this memory controller. |
Metrics:
Metric | Dimensions | Unit |
---|---|---|
mem.edac_mc_errors | correctable, uncorrectable, correctable_noinfo, uncorrectable_noinfo | errors |
These metrics refer to the memory module (or rank, depends on the memory controller).
Labels:
Label | Description |
---|---|
controller | mcX directory name of this memory controller. |
dimm | dimmX or rankX directory name of this memory module. |
dimm_dev_type | Type of DRAM device used in this memory module. For example, x1, x2, x4, x8. |
dimm_edac_mode | Used type of error detection and correction. For example, S4ECD4ED would mean a Chipkill with x4 DRAM. |
dimm_label | Label assigned to this memory module. |
dimm_location | Location of the memory module. |
dimm_mem_type | Type of the memory module. |
size | The amount of memory in megabytes that this memory module manages. |
Metrics:
Metric | Dimensions | Unit |
---|---|---|
mem.edac_mc_errors | correctable, uncorrectable | errors |
The following alerts are available:
Alert name | On metric | Description |
---|---|---|
ecc_memory_mc_noinfo_correctable | mem.edac_mc_errors | memory controller ${label:controller} ECC correctable errors (unknown DIMM slot) |
ecc_memory_mc_noinfo_uncorrectable | mem.edac_mc_errors | memory controller ${label:controller} ECC uncorrectable errors (unknown DIMM slot) |
ecc_memory_dimm_correctable | mem.edac_mc_dimm_errors | DIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC correctable errors |
ecc_memory_dimm_uncorrectable | mem.edac_mc_dimm_errors | DIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC uncorrectable errors |
Want a personalised demo of Netdata for your use case?