Tags:
create new tag
view all tags

Arrow left Go to previous page / next page of CMS site log MOVED TO...

08. 10. 2010 Phedex agent dying at irregular intervals

Since the security kernel and SW packages update of last week we experienced instability of our PhEDEx agents. In the prod instance the download agent had died twice, and the block-verify agent once. The services start up fine without any problem.

Today, I finally found some useful lines in the download agent log. Also it only happened in the prod instance which sees high loads.

2010-10-07 18:07:47: FileDownload[6929]: copy job job.1286209863.137 assigned to link T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS with 20 tasks and p=0.281 and W=0.020 and 476 tasks in queue
2010-10-07 18:07:47: FileDownload[6929]: balancing transfers on 1 links
2010-10-07 18:07:47: FileDownload[6929]: backend busy: maximum link pending files for T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS (5) reached
2010-10-07 18:07:47: FileDownload[6929]: link T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS is busy at the moment, not allocating transfers
2010-10-07 18:08:02: FileDownload[6929]: balancing transfers on 1 links
2010-10-07 18:08:02: FileDownload[6929]: backend busy: maximum link pending files for T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS (5) reached
2010-10-07 18:08:02: FileDownload[6929]: link T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS is busy at the moment, not allocating transfers
Use of uninitialized value in hash element at /home/phedex/sw/slc5_amd64_gcc434/cms/PHEDEX/PHEDEX_3_3_1/perl_lib/PHEDEX/Core/JobManager.pm line 135.
Use of uninitialized value in hash element at /home/phedex/sw/slc5_amd64_gcc434/cms/PHEDEX/PHEDEX_3_3_1/perl_lib/PHEDEX/Core/JobManager.pm line 136.
6929: !!! Child process PID:22305 reaped:
6929: !!! Child process PID:22228 reaped:
6929: !!! Child process PID:22186 reaped:
6929: !!! Child process PID:22258 reaped:
6929: !!! Child process PID:22162 reaped:
6929: !!! Child process PID:22138 reaped:
6929: !!! Child process PID:22085 reaped:
6929: !!! Child process PID:22207 reaped:
6929: !!! Child process PID:22325 reaped:
6929: !!! Your program may not be using sig_child() to reap processes.
6929: !!! In extreme cases, your program can force a system reboot
6929: !!! if this resource leakage is not corrected.
couldn't fork: Cannot allocate memory at /home/phedex/sw/slc5_amd64_gcc434/external/p5-poe-component-child/1.39-cmp2/lib/site_perl/5.8.8/POE/Component/Child.pm line 181

-- DerekFeichtinger - 2010-10-08


Arrow left Go to previous page / next page of CMS site log MOVED TO...

Topic revision: r1 - 2010-10-08 - DerekFeichtinger
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback