Tags:
create new tag
view all tags

KeyWords: HostCe01, Maui

MAUI blocked for 15 minutes from 11:51 to 12:06

MAUI blocked for about 15 minutes; by the time I noticed this and started to investigate, it resumed spontaneously and started scheduling jobs again.

I suspected that this was another instance of the issue described in IssueMauiBlocks and in an Old Phoenix Blog post. However, all suggested workarounds were in place: nscd was running fine, and the suggested timeout in the PBS config is still there.

The node ce01 is overloaded due to some intense gridftp/grid-job-monitor activity; this suggests that 15-minutes pauses are MAUI's "physiological" response to timeouts, and that we probably should not care much.

We could, however, schedule Nagios checks to see if PBS and MAUI's logs are updated - if they are not modified every minute or so, something is probably going wrong.

Attached is a quick perl script to find these 15-minutes pauses in MAUI logs.

-- RiccardoMurri - 25 Feb 2009

Readers' comments

 
Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt find_15min_pauses_in_maui_log.pl.txt r1 manage 0.4 K 2009-02-25 - 12:41 RiccardoMurri Script to find "gaps" in MAUI logs.
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2009-02-27 - RiccardoMurri
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback