<!-- keep this as a security measure:<br /> * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup<br /> * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup<br /> #uncomment this if you want the page only be viewable by the internal people<br /> #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup<br />--> ---+ Swiss Grid Operations Meeting on 2019-12-05 at 14:00 Check calendar invitation for CSCS Zoom details. <br />%TOC% ---++ Action items * Item1 ---++ Site status ---+++ CSCS * [[%ATTACHURL%/CHIPPreportNov2019.pdf][CHIPPreportNov2019.pdf]]<span style="background-color: transparent;">: CSCS November Report</span> * ---+++ PSI * Xxx * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][Accounting numbers (from scheduler) from last month]] ---+++ UNIBE-LHEP * Xxx * Accounting numbers (from scheduler) from last month ---+++ UNIBE-ID * Some job errors due to storage problems. The cause of this issue were bad IB cables, mechanically damaged during the server room reconstruction. * Some cables replaced, the rest will get replaced in the next downtime on 19-12-12 * ARC CE otherwise running smoothly ---+++ UNIGE * Xxx * Accounting numbers (from scheduler) from last month ---+++ NGI_CH * Report on this ticket:<br />REFERENCE LINK: [[https://ggus.eu/index.php?mode=ticket_info&ticket_id=144342]]<br />SUBJECT: NGI_CH - November 2019 - RP/RC OLA performance <br /><br /> <div id="1575540900.318500"> <div dir="auto"> <p>such tickets are a “standard formulation”, we have received tons in the past, affecting all sites, due to the fact that the ops probes failures go inevitably undetected, when these do not affect the production experiments. In this specific case, it is the first time the ticket has been also notified to the site. In the past, it was just assigned to the NGI_CH, so only I would receive notification. Then would do some investigation with the site, and report on the ticket. In some cases, Dario and Dino might remember, we never found the cause of some errors that appeared and went away on their own.</p> </div> </div> <div id="1575541045.320300"> <p> </p> <p>F [[https://cscs-lcg.slack.com/archives/C1H1XBS14/p1575541045320300][<span style="background-color: transparent; color: #000000;">or this specific ticket, I had spotted the error (by pure chance) and reported to dario, it was corrected within <24h on the 22nd Nov.In fact, it can be seen in this link that the availability goes back to 100% by the end of that day </span>https://egi.ui.argo.grnet.gr/egi/report-ar-group/Critical/2019-11/SITES/CSCS-LCG2]]</p> </div> <br />I also see during that perios issues affecting the ARC CEs, but these went away spontaneously and it is no longer easy to investigate what happened back at failure times.<br /><br />To mitigate in the future, we have mentioned in the past that there exist the possibility of turning on notification at the site/service level in GOCDB. These will trigegr email to the GOCDB site contact in case some ops probes fail. Each site should choose their own matrix of notifications. There are two independent levels: site level (can be turned on by editing the main site page), and service level (can be turned on by editing each servic page)<br /><br /><br /> * NGI-CH Open Tickets review ---++ Other topics * Topic1 * Topic2 Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: * CMS: * ATLAS: * LHCb: * EGI: ---++ * [[%ATTACHURL%/CHIPPreportNov2019.pdf][CHIPPreportNov2019.pdf]]: CSCS November Report
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
pdf
CHIPPreportNov2019.pdf
r1
manage
879.5 K
2019-12-05 - 13:01
NickCardo
CSCS November Report
This topic: LCGTier2
>
WebHome
>
MeetingsBoard
>
MeetingSwissGridOperations20191205
Topic revision: r5 - 2019-12-05 - NickCardo
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback