<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> %TOC% %ICON{arrowleft}% Go to [[CMSTier3LogXX][previous page]] / [[CMSTier3LogXX][next page]] of Tier3 site log %M% ---+ 13. 03. 2015 =t3wn[30-40]= RAM errors ---++ EDCA RAM errors in one server <pre> sframe_main[156845]: segfault at 100000018 ip 0000003013e75ef5 sp 00007fffa538aab0 error 4 in libc-2.12.so[3013e00000+18a000] sframe_main[156553]: segfault at 100000018 ip 0000003013e75ef5 sp 00007fff051f8a60 error 4 in libc-2.12.so[3013e00000+18a000] sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 8: 8c00004e000800c0 TSC 0 ADDR 658238000 MISC 908440004001c8c PROCESSOR 0:206d7 TIME 1425414669 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x658238000 => socket=1, Channel=0(mask=1), rank=1 sframe_main[22853]: segfault at 100000018 ip 0000003013e75ef5 sp 00007fff6cdff7c0 error 4 in libc-2.12.so[3013e00000+18a000] sframe_main[23825]: segfault at 100000036 ip 0000003013e75ef5 sp 00007fffb208f020 error 4 in libc-2.12.so[3013e00000+18a000] sframe_main[23827]: segfault at 100000036 ip 0000003013e75ef5 sp 00007fff6ea0ca70 error 4 in libc-2.12.so[3013e00000+18a000] sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 5: 8c00004000010090 TSC 0 ADDR 658238600 MISC 421efe86 PROCESSOR 0:206d7 TIME 1425716125 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory read on FATAL area : cpu=8 Err=0001:0090 (ch=0), addr = 0x658238600 => socket=1, Channel=0(mask=1), rank=1 sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 8: 8c00004e000800c0 TSC 0 ADDR 658238000 MISC 908440004001c8c PROCESSOR 0:206d7 TIME 1425757548 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x658238000 => socket=1, Channel=0(mask=1), rank=1 ... sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 5: 8c00004000010090 TSC 0 ADDR 658238600 MISC 4214f486 PROCESSOR 0:206d7 TIME 1425986107 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory read on FATAL area : cpu=8 Err=0001:0090 (ch=0), addr = 0x658238600 => socket=1, Channel=0(mask=1), rank=1 sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 8: 8c00004e000800c0 TSC 0 ADDR 658238000 MISC 908440004001c8c PROCESSOR 0:206d7 TIME 1426091335 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x658238000 => socket=1, Channel=0(mask=1), rank=1 sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 5: 8c00004000010090 TSC 0 ADDR 658238600 MISC 421cfc86 PROCESSOR 0:206d7 TIME 1426193819 SOCKET 1 APIC 20 {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 {1}[Hardware Error]: It has been corrected by h/w and requires no further action {1}[Hardware Error]: event severity: corrected {1}[Hardware Error]: Error 0, type: corrected {1}[Hardware Error]: section_type: memory error [Firmware Warn]: error section length is too small EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory read on FATAL area : cpu=8 Err=0001:0090 (ch=0), addr = 0x658238600 => socket=1, Channel=0(mask=1), rank=1 </pre> ---++ EDAC RAM analysis Only relevant outputs: <pre> [root@t3admin01 ~]# salt 't3wn3*' cmd.run '%BLUE%edac-util%ENDCOLOR%' %GREEN%t3wn33.psi.ch%ENDCOLOR%: mc0: csrow0: CPU_SrcID#0_Channel#0_DIMM#0: 108 Corrected Errors mc0: csrow2: CPU_SrcID#0_Channel#2_DIMM#0: 119 Corrected Errors mc1: csrow0: CPU_SrcID#1_Channel#0_DIMM#0: 52 Corrected Errors mc1: csrow2: CPU_SrcID#1_Channel#2_DIMM#0: 15 Corrected Errors %GREEN%t3wn39.psi.ch%ENDCOLOR%: mc1: csrow0: CPU_SrcID#1_Channel#0_DIMM#0: 10 Corrected Errors %GREEN%t3wn34.psi.ch%ENDCOLOR%: mc0: csrow0: CPU_SrcID#0_Channel#0_DIMM#0: 1 Corrected Errors mc0: csrow2: CPU_SrcID#0_Channel#2_DIMM#0: 2 Corrected Errors [root@t3admin01 ~]# salt 't3wn4*' cmd.run '%BLUE%edac-util%ENDCOLOR%' %GREEN%t3wn40.psi.ch%ENDCOLOR%: mc1: csrow0: CPU_SrcID#1_Channel#0_DIMM#0: 10 Corrected Errors </pre> ---------------- %ICON{arrowleft}% Go to [[CMSTier3LogXX][previous page]] / [[CMSTier3LogXX][next page]] of Tier3 site log %M%
This topic: CmsTier3
>
WebHome
>
CMSTier3Log
>
CMSTier3Log68
Topic revision: r2 - 2015-03-13 - FabioMartinelli
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback