Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> %TOC% %ICON{arrowleft}% Go to [[CMSTier3Log62][previous page]] / [[CMSTier3Log64][next page]] of Tier3 site log %M% ---+ 19. 03. 2014 t3fs05 unresponsive ---++ Symptoms The =t3fs05= fileserver hosting =/swshare= became unresponsive during the night; this made various Nagios checks and basically all interactive operation fail (some folders in =/swshare= are in the default =$PATH=) ---++ Solution The host seemed to be running (power supply live) but connecting through the IPMI console was not possible. <pre> -> show /SYS/PS0/PWROK /SYS/PS0/PWROK Targets: Properties: type = Power Supply class = Discrete Sensor value = State Asserted Commands: cd show </pre> Restarting with <pre> # ipmitool -I lanplus -H rmfs05 -U root -f /root/private/ipmi-pw chassis power reset </pre> made the server boot again. During the boot sequence the serial console printed the following lines: <pre> SunOS Release 5.10 Version Generic_141445-09 64-bit Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: t3fs05.psi.ch Reading ZFS config: done. Mounting ZFS filesystems: (8/8) Mar 19 07:39:47 svc.startd[7]: network/cswsmartd:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details) Mar 19 07:39:47 svc.startd[7]: failed to abandon contract 66: Permission denied t3fs05.psi.ch console login: Mar 19 07:40:09 t3fs05.psi.ch xntpd[650]: getnetnum: "dmztime1.psi.ch" invalid host number, line ignored Mar 19 07:40:09 t3fs05.psi.ch xntpd[650]: getnetnum: "dmztime2.psi.ch" invalid host number, line ignored </pre> ---++ Lessons Learned * The hardware of =t3fs05= and =t3fs06= is getting old; we may see more failures * Some Nagios checks (even on other hosts) depend on =t3fs05= (again due to the =$PATH= environment variable) -- Main.DanielMeister - 2014-03-20 ---------------- %ICON{arrowleft}% Go to [[CMSTier3Log62][previous page]] / [[CMSTier3Log64][next page]] of Tier3 site log %M%
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r1 - 2014-03-20
-
DanielMeister
CmsTier3
Log In
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
User Pages
Main Page
Policies
Monitoring Storage Space
Monitoring Slurm Usage
Physics Groups
Steering Board Meetings
Admin Pages
AdminArea
Cluster Specs
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
CmsTier3 Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback