Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> %TOC% %ICON{arrowleft}% Go to [[CMSTier3LogXX][previous page]] / [[CMSTier3LogXX][next page]] of Tier3 site log %M% ---+ 13. 03. 2015 =t3wn[30-40]= RAM errors ---++ EDCA RAM errors in one server <pre> sframe_main[156845]: segfault at 100000018 ip 0000003013e75ef5 sp 00007fffa538aab0 error 4 in libc-2.12.so[3013e00000+18a000] sframe_main[156553]: segfault at 100000018 ip 0000003013e75ef5 sp 00007fff051f8a60 error 4 in libc-2.12.so[3013e00000+18a000] sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 8: 8c00004e000800c0 TSC 0 ADDR 658238000 MISC 908440004001c8c PROCESSOR 0:206d7 TIME 1425414669 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x658238000 => socket=1, Channel=0(mask=1), rank=1 sframe_main[22853]: segfault at 100000018 ip 0000003013e75ef5 sp 00007fff6cdff7c0 error 4 in libc-2.12.so[3013e00000+18a000] sframe_main[23825]: segfault at 100000036 ip 0000003013e75ef5 sp 00007fffb208f020 error 4 in libc-2.12.so[3013e00000+18a000] sframe_main[23827]: segfault at 100000036 ip 0000003013e75ef5 sp 00007fff6ea0ca70 error 4 in libc-2.12.so[3013e00000+18a000] sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 5: 8c00004000010090 TSC 0 ADDR 658238600 MISC 421efe86 PROCESSOR 0:206d7 TIME 1425716125 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory read on FATAL area : cpu=8 Err=0001:0090 (ch=0), addr = 0x658238600 => socket=1, Channel=0(mask=1), rank=1 sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 8: 8c00004e000800c0 TSC 0 ADDR 658238000 MISC 908440004001c8c PROCESSOR 0:206d7 TIME 1425757548 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x658238000 => socket=1, Channel=0(mask=1), rank=1 ... sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 5: 8c00004000010090 TSC 0 ADDR 658238600 MISC 4214f486 PROCESSOR 0:206d7 TIME 1425986107 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory read on FATAL area : cpu=8 Err=0001:0090 (ch=0), addr = 0x658238600 => socket=1, Channel=0(mask=1), rank=1 sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 8: 8c00004e000800c0 TSC 0 ADDR 658238000 MISC 908440004001c8c PROCESSOR 0:206d7 TIME 1426091335 SOCKET 1 APIC 20 EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory scrubbing on FATAL area : cpu=8 Err=0008:00c0 (ch=0), addr = 0x658238000 => socket=1, Channel=0(mask=1), rank=1 sbridge: HANDLING MCE MEMORY ERROR CPU 8: Machine Check Exception: 0 Bank 5: 8c00004000010090 TSC 0 ADDR 658238600 MISC 421cfc86 PROCESSOR 0:206d7 TIME 1426193819 SOCKET 1 APIC 20 {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 {1}[Hardware Error]: It has been corrected by h/w and requires no further action {1}[Hardware Error]: event severity: corrected {1}[Hardware Error]: Error 0, type: corrected {1}[Hardware Error]: section_type: memory error [Firmware Warn]: error section length is too small EDAC MC1: CE row 0, channel 0, label "CPU_SrcID#1_Channel#0_DIMM#0": 1 Unknown error(s): memory read on FATAL area : cpu=8 Err=0001:0090 (ch=0), addr = 0x658238600 => socket=1, Channel=0(mask=1), rank=1 </pre> ---++ EDAC RAM analysis Only relevant outputs: <pre> [root@t3admin01 ~]# salt 't3wn3*' cmd.run '%BLUE%edac-util%ENDCOLOR%' %GREEN%t3wn33.psi.ch%ENDCOLOR%: mc0: csrow0: CPU_SrcID#0_Channel#0_DIMM#0: 108 Corrected Errors mc0: csrow2: CPU_SrcID#0_Channel#2_DIMM#0: 119 Corrected Errors mc1: csrow0: CPU_SrcID#1_Channel#0_DIMM#0: 52 Corrected Errors mc1: csrow2: CPU_SrcID#1_Channel#2_DIMM#0: 15 Corrected Errors %GREEN%t3wn39.psi.ch%ENDCOLOR%: mc1: csrow0: CPU_SrcID#1_Channel#0_DIMM#0: 10 Corrected Errors %GREEN%t3wn34.psi.ch%ENDCOLOR%: mc0: csrow0: CPU_SrcID#0_Channel#0_DIMM#0: 1 Corrected Errors mc0: csrow2: CPU_SrcID#0_Channel#2_DIMM#0: 2 Corrected Errors [root@t3admin01 ~]# salt 't3wn4*' cmd.run '%BLUE%edac-util%ENDCOLOR%' %GREEN%t3wn40.psi.ch%ENDCOLOR%: mc1: csrow0: CPU_SrcID#1_Channel#0_DIMM#0: 10 Corrected Errors </pre> ---------------- %ICON{arrowleft}% Go to [[CMSTier3LogXX][previous page]] / [[CMSTier3LogXX][next page]] of Tier3 site log %M%
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r2 - 2015-03-13
-
FabioMartinelli
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback