Go to
previous page /
next page of Tier3 site log
28. 12. 2012 t3fs07,t3fs08 went down
A bit of theory about ACPI states
Today these 2 servers went down:
t3fs08
ID = f8 : 12/28/2012 : 09:04:43 : System ACPI Power State : ACPI : S5/G2: soft-off
ID = f9 : 12/28/2012 : 09:04:45 : Power Supply : PS0/PWROK : State Deasserted
ID = fa : 12/28/2012 : 09:04:47 : Power Supply : PS1/PWROK : State Deasserted
t3fs07
ID = 308 : 12/28/2012 : 10:04:29 : System ACPI Power State : ACPI : S5/G2: soft-off # 10:04:29 => 09:04:29
ID = 309 : 12/28/2012 : 10:04:31 : Power Supply : PS0/PWROK : State Deasserted
ID = 30a : 12/28/2012 : 10:04:33 : Power Supply : PS1/PWROK : State Deasserted
Nagios said ( I think this is a consequence, not the cause )
Notification Type: PROBLEM
Service: Temperatures Celsius t3fs07 [t3fs08]
Host: t3admin01
Address: 192.33.123.21
State: CRITICAL
Date/Time: 12-28-2012 09:25:11
IPMI Status: Critical [P0/T_CORE = N/A, P1/T_CORE = N/A]
IPMI logs:
t3fs07 : ipmitool -I lanplus -H rmfs07 -U root -f /root/private/ipmi-pw sel list
307 | 12/28/2012 | 10:04:29 | System ACPI Power State #0xea | S0/G0: working | Deasserted
308 | 12/28/2012 | 10:04:29 | System ACPI Power State #0xea | S5/G2: soft-off | Asserted
309 | 12/28/2012 | 10:04:31 | Power Supply #0xbd | State Deasserted
30a | 12/28/2012 | 10:04:33 | Power Supply #0xcc | State Deasserted
t3fs08 : ipmitool -I lanplus -H rmfs08 -U root -f /root/private/ipmi-pw sel list
f7 | 12/28/2012 | 09:04:43 | System ACPI Power State #0xea | S0/G0: working | Deasserted
f8 | 12/28/2012 | 09:04:43 | System ACPI Power State #0xea | S5/G2: soft-off | Asserted
f9 | 12/28/2012 | 09:04:45 | Power Supply #0xbd | State Deasserted
fa | 12/28/2012 | 09:04:47 | Power Supply #0xcc | State Deasserted
Actual Power status:
[root@t3admin01 ~]# ssh rmfs08
Sun(TM) Integrated Lights Out Manager
Version 3.0.3.36
Copyright 2009 Sun Microsystems, Inc. All rights reserved.
-> show /SYS/PS0/PWROK
/SYS/PS0/PWROK
Targets:
Properties:
type = Power Supply
ipmi_name = PS0/PWROK <---
class = Discrete Sensor
value = State Deasserted <---
alarm_status = major <---
Possible causes:
- we lost 4 power supply ( quite improbable )
- somebody/something unconnected the power cables.
- we lost the power where these 4 power supply are connected.
--
FabioMartinelli - 2012-12-28
Go to
previous page /
next page of Tier3 site log