<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people # * Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> %TOC% %ICON{arrowleft}% Go to [[CMSSiteLogXX][previous page]] / [[CMSSiteLogXX][next page]] of CMS site log %M% ---+ 08. 10. 2010 Phedex agent dying at irregular intervals Since the security kernel and SW packages update of last week we experienced instability of our PhEDEx agents. In the prod instance the download agent had died twice, and the block-verify agent once. The services start up fine without any problem. Today, I finally found some useful lines in the download agent log. Also it only happened in the prod instance which sees high loads. <pre> 2010-10-07 18:07:47: FileDownload[6929]: copy job job.1286209863.137 assigned to link T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS with 20 tasks and p=0.281 and W=0.020 and 476 tasks in queue 2010-10-07 18:07:47: FileDownload[6929]: balancing transfers on 1 links 2010-10-07 18:07:47: FileDownload[6929]: backend busy: maximum link pending files for T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS (5) reached 2010-10-07 18:07:47: FileDownload[6929]: link T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS is busy at the moment, not allocating transfers 2010-10-07 18:08:02: FileDownload[6929]: balancing transfers on 1 links 2010-10-07 18:08:02: FileDownload[6929]: backend busy: maximum link pending files for T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS (5) reached 2010-10-07 18:08:02: FileDownload[6929]: link T1_FR_CCIN2P3_Buffer -> T2_CH_CSCS is busy at the moment, not allocating transfers Use of uninitialized value in hash element at /home/phedex/sw/slc5_amd64_gcc434/cms/PHEDEX/PHEDEX_3_3_1/perl_lib/PHEDEX/Core/JobManager.pm line 135. Use of uninitialized value in hash element at /home/phedex/sw/slc5_amd64_gcc434/cms/PHEDEX/PHEDEX_3_3_1/perl_lib/PHEDEX/Core/JobManager.pm line 136. 6929: !!! Child process PID:22305 reaped: 6929: !!! Child process PID:22228 reaped: 6929: !!! Child process PID:22186 reaped: 6929: !!! Child process PID:22258 reaped: 6929: !!! Child process PID:22162 reaped: 6929: !!! Child process PID:22138 reaped: 6929: !!! Child process PID:22085 reaped: 6929: !!! Child process PID:22207 reaped: 6929: !!! Child process PID:22325 reaped: 6929: !!! Your program may not be using sig_child() to reap processes. 6929: !!! In extreme cases, your program can force a system reboot 6929: !!! if this resource leakage is not corrected. couldn't fork: Cannot allocate memory at /home/phedex/sw/slc5_amd64_gcc434/external/p5-poe-component-child/1.39-cmp2/lib/site_perl/5.8.8/POE/Component/Child.pm line 181 </pre> -- Main.DerekFeichtinger - 2010-10-08 ---------------- %ICON{arrowleft}% Go to [[CMSSiteLogXX][previous page]] / [[CMSSiteLogXX][next page]] of CMS site log %M%
This topic: LCGTier2
>
WebHome
>
CMSInfoPages
>
CMSSiteLog
>
CMSSiteLog25
Topic revision: r1 - 2010-10-08 - DerekFeichtinger
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback