KeyWords:
HostCe01,
Maui
MAUI blocked for 15 minutes from 11:51 to 12:06
MAUI blocked for about 15 minutes; by the time I noticed this and
started to investigate, it resumed spontaneously and started
scheduling jobs again.
I suspected that this was another instance of the issue described in
IssueMauiBlocks and in
an Old Phoenix Blog post.
However, all suggested workarounds were in place:
nscd
was running
fine, and the suggested timeout in the PBS config is still there.
The node
ce01
is overloaded due to some intense
gridftp/grid-job-monitor activity; this suggests that 15-minutes
pauses are MAUI's "physiological" response to timeouts, and that we
probably should not care much.
We could, however, schedule Nagios checks to see if PBS and MAUI's
logs are updated - if they are not modified every minute or so,
something is probably going wrong.
Attached is a quick perl script to find these 15-minutes pauses in MAUI logs.
--
RiccardoMurri - 25 Feb 2009
Readers' comments