<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people * #Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ !!CMS Site Log for PHOENIX Cluster %TOC% %ICON{arrowleft}% Go to [[CMSSiteLog4][previous page]] / [[CMSSiteLog6][next page]] of CMS site log %M% ---++ 9. 3. 2007 DPM service crash due to automatic updates requiring manual intervention Automatic apt updates of the DPM service RPMs led to a failure of our monitoring (because =dpm-qryconf= segfaulted). A restart of the services resulted in a failure to start the =dpm= service and as it turned out, seemed to corrupt the DB. Only then did we see that the RPMs had been updated and that a manual procedure was required to migrate the DB to a new schema. In our opinion these changes were not communicated with the required visibility. We had originally thought thet the updates usually cover security relevant things and that no major service updates are entered into this apt repository. Migrating the DB using the documentation provided by LCG was no longer possible since the DB had been corrupted. I was forced to play back a backup of the morning. The update script ran (almost) correctly with this. I was able to start up the service again by Friday night. ---++ 13./14. 3. 2007 Cycle-1 Week-5 Load tests According to the [[http://www.cnaf.infn.it/~dbonacorsi/LoadTest07/C1w5.htm][instructions]] from D. Bonarcorsi I started a download test from FZK using the =/PhEDEx_Prod/LoadTest07_FZK/CSCS= sample. The transfers were going worse than end of last month with intermittent failure periods. This may be the result of the FZK dCache SRM instability that has been mentioned by the operators. Note: !PhEDEx page shows transfer speeds of ~7 MB/s. <verbatim> SITE STATISTICS: ================== first entry: 2007-03-13 14:32:59 last entry: 2007-03-15 09:50:02 site: T1_FZK_Buffer (OK: 417 / Err: 136) succ. rate: 75.4 % total: 1030.2 GB avg. rate: 4.2 MB/s = 35.5 Mb/s Error message statistics per site: =================================== *** ERRORS from T1_FZK_Buffer:*** 85 Failed Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping: SOAP-ENV:Client - CGSI-gSOAP: Could not open connection ! 33 the server sent an error response: 425 425 Can't open data connection 11 Failed Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping: SOAP-ENV:Client - CGSI-gSOAP: Error reading token data: Success 4 transfer expired in the download agent queue 1 Failed Failed on SRM get: Cannot Contact SRM Service. Error in srm__ping: service timeout. 1 Failed Cannot retrieve final message from /var/tmp/glite-url-copy-edguser/STAR-CSCSfailed/STAR-CSCS__2007-03-14-1418_2yYlpG task IDs: 4643882 1 Failed Failed on SRM get: SRM getRequestStatus timed out on get </verbatim> <br /> <img src="%ATTACHURLPATH%/cpu_report-20070314-1153.gif" alt="cpu_report-20070314-1153.gif" width='391' height='147' /> <br /> <img src="%ATTACHURLPATH%/network_report-20070314-1153.gif" alt="network_report-20070314-1153.gif" width='391' height='147' /> %ICON{arrowleft}% Go to [[CMSSiteLog4][previous page]] / [[CMSSiteLog6][next page]] of CMS site log %M% -- Main.DerekFeichtinger - 14 Mar 2007
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
gif
cpu_report-20070314-1153.gif
r1
manage
12.5 K
2007-03-14 - 11:07
DerekFeichtinger
gif
network_report-20070314-1153.gif
r1
manage
11.9 K
2007-03-14 - 11:08
DerekFeichtinger
This topic: LCGTier2
>
WebHome
>
CMSInfoPages
>
CMSSiteLog
>
CMSSiteLog5
Topic revision: r3 - 2007-03-15 - DerekFeichtinger
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback