Tags:
create new tag
view all tags

Arrow left Go to previous page / next page of Tier3 site log MOVED TO...

26. 12. 2012 t3fs14 reboot on Dec 25th

On Dec 25th around 9 a.m. CET there was a Nagios warning about the host t3fs14 being offline; other checks confirm the host was unavailable for some minutes; the OS log files show nothing suspicious

Dec 25 07:12:07 t3fs14.psi.ch syslog-ng[2520]: Log statistics; dropped='tcp(t3service01.psi.ch:1514)=0', processed='center(queued)=6796', processed='center(received)=3399', processed='destination(d_loghost)=3399', processed='destination(d_boot)=0', processed='destination(d_auth)=2080', processed='destination(d_cron)=1259', processed='destination(d_mlal)=0', processed='destination(d_mesg)=54', processed='destination(d_cons)=0', processed='destination(d_spol)=0', processed='destination(d_mail)=4', processed='source(s_local)=3399', suppressed='tcp(t3service01.psi.ch:1514)=0'
Dec 25 08:12:07 t3fs14.psi.ch syslog-ng[2520]: Log statistics; dropped='tcp(t3service01.psi.ch:1514)=0', processed='center(queued)=6924', processed='center(received)=3463', processed='destination(d_loghost)=3463', processed='destination(d_boot)=0', processed='destination(d_auth)=2120', processed='destination(d_cron)=1282', processed='destination(d_mlal)=0', processed='destination(d_mesg)=55', processed='destination(d_cons)=0', processed='destination(d_spol)=0', processed='destination(d_mail)=4', processed='source(s_local)=3463', suppressed='tcp(t3service01.psi.ch:1514)=0'
Dec 25 09:07:32 t3fs14.psi.ch kernel: Initializing cgroup subsys cpuset
Dec 25 09:07:32 t3fs14.psi.ch kernel: Initializing cgroup subsys cpu
Dec 25 09:07:32 t3fs14.psi.ch kernel: Command line: ro root=UUID=a247aed2-7b16-4306-9485-2adc3f62a6da rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS1,115200 elevator=noop irqpoll nr_cpus=1 reset_devices cgroup_disable=memory  memmap=exactmap memmap=631K@4K memmap=134517K@49783K elfcorehdr=184300K memmap=4K$0K memmap=5K$635K memmap=64K$960K memmap=52K#3659964K memmap=75532K$3660020K memmap=2112K$4173824K memmap=8192K$4186112K
Dec 25 09:07:32 t3fs14.psi.ch kernel: KERNEL supported cpus:

no clues in the low level logs:

[root@t3admin01 ~]# ipmitool -I lanplus -H rmfs14 -U root -f /root/private/ipmi-pw sel elist 
   1 | 05/05/2011 | 08:53:33 | Power Supply Power Supply 2 | Failure detected | Asserted
   2 | 06/22/2011 | 15:33:17 | Power Supply Power Supply 1 | Failure detected | Asserted
   3 | 06/22/2011 | 15:36:55 | Power Supply Power Supply 1 | Failure detected | Asserted
   4 | 02/06/2012 | 15:21:30 | Power Supply Power Supply 2 | Failure detected | Asserted
but by connecting to the HP Service processor ( ssh rmfs14 ) I see:
hpiLO-> show /system1/log1/record15
status=0
status_tag=COMMAND COMPLETED
Wed Dec 26 14:12:30 2012

/system1/log1/record15
  Targets
  Properties
    number=15
    severity=Critical
    date=12/25/2012
    time=09:16
    description=ASR Detected by System ROM  <---------
  Verbs
    cd version exit show set
that points me to an old Raid controller FW, or maybe a broken CPU or a bug in the HP iLO3.

For the time being I've updated:

  • the Raid Controller FW, but still a server reboot is needed; I'll reboot during the scheduled PSI downtime on Jan '13.
  • the HP iLO3, rebooted automatically.
  • downloaded the latest RDAC from NetApp, to be compiled.
  • the Linux kernel to use a more recent Raid Controller Linux Driver, but a reboot + RDAC compilation + reboot is needed; I'll reboot during the scheduled PSI downtime on Jan '13.

After the automatic reboot everything seems to work ok except the d-cache pools (pools unavailable, automatic checks failing).

After checking the mounted file systems, issuing /opt/d-cache/bin/dcache restart, and waiting 10 minutes the SE operations went back to normal.

-- DanielMeister - 2012-12-26


Arrow left Go to previous page / next page of Tier3 site log MOVED TO...

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2016-11-04 - FabioMartinelli
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback