设为首页 | | 加入桌面 | 手机版 | 最新发布
高级搜索 标王直达
排名推广
排名推广
会员中心
会员中心
 
当前位置: 首页 » 资讯 » 商务贸易 » 技术交流 » 正文

Hardware Error 内存报错 kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB

放大字体  缩小字体 发布日期:2018-05-18 08:54:02  来源:满分企业网  浏览次数:0  发布者:jasonzhang
核心提示:满分企业网用的服务器突然内存报错,不断错误蹦出来无法控制,查看/var/log/messages日志发现:May 17 12:39:44 oldweb kernel:
满分企业网用的服务器突然内存报错,不断错误蹦出来无法控制,查看/var/log/messages日志发现:
May 17 12:39:44 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:39:44 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0xa97802b50
May 17 12:39:44 oldweb kernel: EDAC MC2: CE page 0xa97802, offset 0xb50, grain 0, syndrome 0x11c1, row 0, channel 1, label "": amd64_edac
May 17 12:39:44 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:39:44 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:39:44 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x0000000a97802b50
May 17 12:39:44 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:42:14 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:42:14 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0xaa880c350
May 17 12:42:14 oldweb kernel: EDAC MC2: CE page 0xaa880c, offset 0x350, grain 0, syndrome 0x11c1, row 0, channel 1, label "": amd64_edac
May 17 12:42:14 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:42:14 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:42:14 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x0000000aa880c350
May 17 12:42:14 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:43:29 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:43:29 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0xbe1283db0
May 17 12:43:29 oldweb kernel: EDAC MC2: CE page 0xbe1283, offset 0xdb0, grain 0, syndrome 0x11c1, row 4, channel 1, label "": amd64_edac
May 17 12:43:29 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:43:29 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:43:29 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x0000000be1283db0
May 17 12:43:29 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:06 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:06 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x97eeb47a0
May 17 12:44:06 oldweb kernel: EDAC MC2: CE page 0x97eeb4, offset 0x7a0, grain 0, syndrome 0x11c1, row 5, channel 1, label "": amd64_edac
May 17 12:44:06 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:06 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:44:06 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x000000097eeb47a0
May 17 12:44:06 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:25 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:25 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8fe32c0c0
May 17 12:44:25 oldweb kernel: EDAC MC2: CE page 0x8fe32c, offset 0xc0, grain 0, syndrome 0x2242, row 1, channel 1, label "": amd64_edac
May 17 12:44:25 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:25 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc21400022080a13
May 17 12:44:25 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x00000008fe32c0c0
May 17 12:44:25 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:34 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:34 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8e2f2b0c0
May 17 12:44:34 oldweb kernel: EDAC MC2: CE page 0x8e2f2b, offset 0xc0, grain 0, syndrome 0x11c1, row 1, channel 1, label "": amd64_edac
May 17 12:44:34 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:34 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080813
May 17 12:44:34 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x00000008e2f2b0c0
May 17 12:44:34 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: SRC (no timeout)
May 17 12:44:39 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:39 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8d1504000
May 17 12:44:39 oldweb kernel: EDAC MC2: CE page 0x8d1504, offset 0x0, grain 0, syndrome 0x2242, row 0, channel 1, label "": amd64_edac
May 17 12:44:39 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:39 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc21400022080a13
May 17 12:44:39 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x00000008d1504000
May 17 12:44:39 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:41 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:41 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8b1012c80
May 17 12:44:41 oldweb kernel: EDAC MC2: CE page 0x8b1012, offset 0xc80, grain 0, syndrome 0x3383, row 0, channel 1, label "": amd64_edac
May 17 12:44:41 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:41 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc41c00033080a13
May 17 12:44:41 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x00000008b1012c80
May 17 12:44:41 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:42 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:42 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0xb66a9caf0
May 17 12:44:42 oldweb kernel: EDAC MC2: CE page 0xb66a9c, offset 0xaf0, grain 0, syndrome 0x11c1, row 4, channel 1, label "": amd64_edac
May 17 12:44:42 oldweb kernel: [Hardware Error]: Error Status: Corrected error, no action required.
May 17 12:44:42 oldweb kernel: [Hardware Error]: CPU:8 (10:2:3) MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c00011080a13
May 17 12:44:42 oldweb kernel: [Hardware Error]: MC4_ADDR: 0x0000000b66a9caf0
May 17 12:44:42 oldweb kernel: [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
May 17 12:44:43 oldweb kernel: [Hardware Error]: MC4 Error (node 2): DRAM ECC error detected on the NB.
May 17 12:44:43 oldweb kernel: EDAC amd64 MC2: CE ERROR_ADDRESS= 0x8923a3f90



网上说 (node 2)是CPU2
于是用命令查看:
grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count

其他省略....
/sys/devices/system/edac/mc/mc2/csrow1/ch1_ce_count:4414           (该条内存不归0了,应该是他)

用其他命令如下

[root@oldweb ~]# grep [0-9] /sys/devices/system/edac/mc/mc2/csrow0/ch*_ce_count
/sys/devices/system/edac/mc/mc2/csrow0/ch0_ce_count:0
/sys/devices/system/edac/mc/mc2/csrow0/ch1_ce_count:4976


[root@oldweb ~]# grep [0-9] /sys/devices/system/edac/mc/mc2/csrow1/ch*_ce_count
/sys/devices/system/edac/mc/mc2/csrow1/ch0_ce_count:0
/sys/devices/system/edac/mc/mc2/csrow1/ch1_ce_count:4414

count不为0的行即代表存在内存错误。

 

mc*:第好多个CPU(一定要看主板上标注的是CPU0还是CPU2,不要把CPU1当成第二颗,根据实际标注)。

csrow*:内存通道。

ch*:通道内的第几根内存。

通过分析知道是CPU2的第一个通道的DIMM 1 出问题了。于是拆下该内存。





本信息的网址是:http://news.cntrades.com/show-186826.html
 
 
 
[ 资讯搜索 ]  [ ]  [ 告诉好友 ]  [ 打印本文 ]  [ 关闭窗口 ]


版权声明】[中国贸易网]转载作品均会主动注明出处,本网部分文章未注明出处和转载的,是出于传递更多信息之目的,并不意味着赞同其观点或证实其内容的真实性。如转载作品侵犯作者署名权,或有其他诸如版权、肖像权、知识产权等方面的伤害,并非本网故意为之,在接到相关权利人通知后将立即加以更正。
工作时间联系电话:0311-8968 8585,投诉信箱:Tousu#cntrades.com(请把#换成@)

 
0条 [查看全部]  相关评论

 
推荐图文
推荐资讯
最新资讯
公司新闻
 
 
分享到:微信新浪微博QQ空间腾讯微博人人网百度贴吧天涯社区百度新首页开心网QQ好友人民微博豆瓣网新华微博
购物车(0)    站内信(0)