Tags:
create new tag
view all tags

Arrow left Go to previous page / next page of Tier3 site log MOVED TO...

15. 01. 2011 Three breakdowns of t3ui01 in 2 days

t3ui01 went for the second time into a hanging state. The SP was still working, but no relevant messages were in the SP logs. The remote logs on the central log server also did not show anything. Ganglia plots did not indicate any kind of load, either on CPUs or network. Also nothing on the console.

t3ui01 had shown similar behavior in June last year (CMSTier3Log13) with logs pointing to a failed DIMM. But the node had kept working after a restart, so nothing was undertaken. Records show that a replacement of a DIMM on that machine already had happened in 2009 (q.v. SUN support case entry).

Before turning it on this time, I ran two hours of memory testing. No error was apparent.

t3ui01 again broke down on 2011-01-25. Nothing in SP logs (also no sign of overheating - just nothing), SP thinks SYS is alive, ganglia shows no load at all, serial console is dead. LEDs in front and back are green. Also no reasons found in remote syslogs.

-- DerekFeichtinger - 2011-01-19


Arrow left Go to previous page / next page of Tier3 site log MOVED TO...

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2011-01-25 - DerekFeichtinger
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback