CMS Site Log for PHOENIX Cluster
Go to
previous page /
next page of CMS site log
9. 3. 2007 DPM service crash due to automatic updates requiring manual intervention
Automatic apt updates of the DPM service RPMs led to a failure of our monitoring (because
dpm-qryconf
segfaulted). A restart of the
services resulted in a failure to start the
dpm
service and as it turned out, seemed to corrupt the DB. Only then did we see that the RPMs had been updated and that a manual procedure was required to migrate the DB to a new schema. In our opinion these changes were not communicated with the required visibility. We had originally thought thet the updates usually cover security relevant things and that no major service updates are entered into this apt repository.
Migrating the DB using the documentation provided by LCG was no longer possible since the DB had been corrupted. I was forced to play back a backup of the morning. The update script ran (almost) correctly with this. I was able to start up the service again by Friday night.
13./14. 3. 2007 Cycle-1 Week-5 Load tests
According to the
instructions from D. Bonarcorsi I started a download test from FZK using the
/PhEDEx_Prod/LoadTest07_FZK/CSCS
sample. The transfers were going worse than end of last month with intermittent failure periods. This may be the result of the FZK dCache SRM instability that has been mentioned by the operators.
Note: PhEDEx page shows transfer speeds of ~7 MB/s.
SITE STATISTICS:
==================
first entry: 2007-03-13 14:32:59 last entry: 2007-03-14 10:58:16
site: T1_FZK_Buffer (OK: 251 / Err: 123) succ. rate: 67.1 % total: 620.1 GB avg. rate: 4.1 MB/s = 34.3 Mb/s
*** ERRORS from T1_FZK_Buffer:***
76 Failed Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping: SOAP-ENV:Client - CGSI-gSOAP: Could not open connection !
30 the server sent an error response: 425 425 Can't open data connection
11 Failed Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping: SOAP-ENV:Client - CGSI-gSOAP: Error reading token data: Success
4 transfer expired in the download agent queue
1 Failed Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping: service timeout.
1 Failed Failed on SRM get: SRM getRequestStatus timed out on get
Go to
previous page /
next page of CMS site log
--
DerekFeichtinger - 14 Mar 2007