Recently I rebooted a storage array and I saw a bunch (one per second) of errors messages spewing on the console and in the message logs.
EDAC MC0: CE page 0x7f579, offset 0x800, grain 128, syndrome 0x70, row 1, channel 1, label "": i3000 CE EDAC MC0: CE page 0x610, offset 0xa80, grain 128, syndrome 0x70, row 0, channel 1, label "": i3000 CE EDAC MC0: CE page 0x7f579, offset 0x800, grain 128, syndrome 0x70, row 1, channel 1, label "": i3000 CE EDAC MC0: CE page 0x7f579, offset 0x800, grain 128, syndrome 0x70, row 1, channel 1, label "": i3000 CE
After a bit of searching I found that this is indicating a correctable error (CE) in memory. My reading also indicated that this might be a sign of impending memory bank failure so I ordered new memory. In the mean time rather than have all that noise in the logs, I wanted to shut it off. By reading the Linux Kernel Documentation on EDAC I was able to figure out the how to shut off the error logging by setting 'edac_mc_log_ce'.
echo 0 > /sys/module/edac_core/parameters/edac_mc_log_ce
and while it was off I could verify that the number of errors was still increasing by looking in 'ce_count'.
cat /sys/devices/system/edac/mc/mc0/ce_count
Recent Comments