CMS Site Log for PHOENIX Cluster

Arrow left Go to previous page / next page of CMS site log MOVED TO...

9. 3. 2007 DPM service crash due to automatic updates requiring manual intervention

Automatic apt updates of the DPM service RPMs led to a failure of our monitoring (because dpm-qryconf segfaulted). A restart of the services resulted in a failure to start the dpm service and as it turned out, seemed to corrupt the DB. Only then did we see that the RPMs had been updated and that a manual procedure was required to migrate the DB to a new schema. In our opinion these changes were not communicated with the required visibility. We had originally thought thet the updates usually cover security relevant things and that no major service updates are entered into this apt repository.

Migrating the DB using the documentation provided by LCG was no longer possible since the DB had been corrupted. I was forced to play back a backup of the morning. The update script ran (almost) correctly with this. I was able to start up the service again by Friday night.

13./14. 3. 2007 Cycle-1 Week-5 Load tests

According to the instructions from D. Bonarcorsi I started a download test from FZK using the /PhEDEx_Prod/LoadTest07_FZK/CSCS sample. The transfers were going worse than end of last month with intermittent failure periods. This may be the result of the FZK dCache SRM instability that has been mentioned by the operators.

Note: PhEDEx page shows transfer speeds of ~7 MB/s.

SITE STATISTICS:
==================
                         first entry: 2007-03-13 14:32:59      last entry: 2007-03-15 09:50:02
site: T1_FZK_Buffer (OK: 417 / Err: 136)   succ. rate: 75.4 %   total: 1030.2 GB   avg. rate: 4.2 MB/s = 35.5 Mb/s

Error message statistics per site:
===================================

 *** ERRORS from T1_FZK_Buffer:***
     85   Failed Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping: SOAP-ENV:Client - CGSI-gSOAP: Could not open connection !
     33   the server sent an error response: 425 425 Can't open data connection
     11   Failed Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping: SOAP-ENV:Client - CGSI-gSOAP: Error reading token data: Success
      4   transfer expired in the download agent queue
      1   Failed Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping: service timeout.
      1   Failed Cannot retrieve final message from /var/tmp/glite-url-copy-edguser/STAR-CSCSfailed/STAR-CSCS__2007-03-14-1418_2yYlpG
             task IDs: 4643882
      1   Failed Failed on SRM get: SRM getRequestStatus timed out on get


cpu_report-20070314-1153.gif
network_report-20070314-1153.gif

Arrow left Go to previous page / next page of CMS site log MOVED TO...

-- DerekFeichtinger - 14 Mar 2007

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2007-03-15 - DerekFeichtinger
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback