Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups # * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup # * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> ---+!! %TOPIC% %TOC% ---++ Symptoms Summary: %FORMFIELD{"Symptom summary"}% ---++ Occurrences At what times did this problem occur (used to estimate frequency): | 2009-12-09 | | 2010-01-06 | ---++ Observations --- t3fs05 fileserver (SUN X4500) An earlier occurence is mentioned as a side issue in IssueDcachePoolHangs On 2010-01-06 the _rear LED_ (on the front panel) was lighting up yellow and the _warning LED_ was blinking. Listing the LED status using IPMI showed no problem: <pre> ipmitool -I lanplus -H rmfs05 -U root -f /root/private/ipmi-pw sdr list generic bp.alert.led | Generic @20:2C.2 | ok bp.locate.led | Generic @20:2C.1 | ok bp.power.led | Generic @20:2C.0 | ok fp.alert.led | Generic @20:18.2 | ok fp.locate.led | Generic @20:18.1 | ok fp.power.led | Generic @20:18.0 | ok sys.rear_svc.led | Generic @20:18.3 | ok </pre> Using the web front end, I see differing information <pre> /SYS/SERVICE Slow Blink /SYS/REAR_SVC On </pre> The event logs show <pre> ... 2401 | 12/11/2009 | 13:08:49 | System Firmware Progress | System boot initiated | Asserted 2501 | 12/31/2009 | 14:42:51 | Voltage io.v_-12v | Lower Non-critical going low | Reading -13.08 < Threshold -13.01 Volts (END) </pre> <pre> -> version SP firmware 1.1.8 SP firmware build number: 19341 SP firmware date: Fri May 25 14:31:22 PDT 2007 SP filesystem version: 0.1.14 </pre> This ILOM still allows login through the sunservice account. Looking into the embedded Linux does not reveal problems as for the X4150 problems with the old ILOMs, where a process on the embedded Linux had a memory leak that caused the kernel to kill other processes, leading to various problems. One sad fact: *The IPMI information on the LEDs does not match the actual LEDs* ---++ Solution or Workaround If there does not seem to be a real error condition and the ILOM seems to be at fault, then a reset of the ILOM service processor will bring the LEDs to a sane state. <pre>reset /SP</pre> ---++ Monitoring for this condition Since the IMPI output does not reflect this condition, the easiest check is actually to have a direct look at the machines in the compute center. Using the web frontend is too inefficient. -- Main.DerekFeichtinger - 2010-01-06 -- Main.DerekFeichtinger - 29 Aug 2008
IssueForm
Affected Service
various
Symptom summary
warning LEDs on machines are on
Reason Understood
yes
Solution Exists
workaround
Obsolete
no
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r1 - 2010-01-06
-
DerekFeichtinger
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback