Go to
previous page /
next page of Tier3 site log
19. 03. 2014 t3fs05 unresponsive
Symptoms
The
t3fs05
fileserver hosting
/swshare
became unresponsive during the night; this made various Nagios checks and basically all interactive operation fail (some folders in
/swshare
are in the default
$PATH
)
Solution
The host seemed to be running (power supply live) but connecting through the IPMI console was not possible.
-> show /SYS/PS0/PWROK
/SYS/PS0/PWROK
Targets:
Properties:
type = Power Supply
class = Discrete Sensor
value = State Asserted
Commands:
cd
show
Restarting with
# ipmitool -I lanplus -H rmfs05 -U root -f /root/private/ipmi-pw chassis power reset
made the server boot again. During the boot sequence the serial console printed the following lines:
SunOS Release 5.10 Version Generic_141445-09 64-bit
Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Hostname: t3fs05.psi.ch
Reading ZFS config: done.
Mounting ZFS filesystems: (8/8)
Mar 19 07:39:47 svc.startd[7]: network/cswsmartd:default failed repeatedly: transitioned to maintenance (see 'svcs -xv' for details)
Mar 19 07:39:47 svc.startd[7]: failed to abandon contract 66: Permission denied
t3fs05.psi.ch console login: Mar 19 07:40:09 t3fs05.psi.ch xntpd[650]: getnetnum: "dmztime1.psi.ch" invalid host number, line ignored
Mar 19 07:40:09 t3fs05.psi.ch xntpd[650]: getnetnum: "dmztime2.psi.ch" invalid host number, line ignored
Lessons Learned
- The hardware of
t3fs05
and t3fs06
is getting old; we may see more failures
- Some Nagios checks (even on other hosts) depend on
t3fs05
(again due to the $PATH
environment variable)
--
DanielMeister - 2014-03-20
Go to
previous page /
next page of Tier3 site log