Tags:
create new tag
view all tags

Swiss Grid Operations Meeting on 2019-12-05 at 14:00

Check calendar invitation for CSCS Zoom details.


Action items

  • All VOs: identify if jobs failing during a maintenance are accounted as failed or ignored.
  • All VOs: validate accounting data for Nov 2019 vs. CSCS accounting. Up
  • Miguel: produce an example command to pull accounting data off a Slurm cluster.
  • Nick: make VO utilisation charts available at each meeting.

Site status

CSCS

  • CHIPPreportNov2019.pdf: CSCS November Report
  • Nick would like to identify if jobs failing during a maintenance are accounted as failed for the site, or ignored. Action Item on all VOs.
  • Christoph would like to see if the VO utilisation charts can be made available at each meeting. Action Item on Nick.
  • Christoph would like VOs to validate their accounting data for Nov 2019 vs. CSCS accounting data. Action Item on all VOs.
  • Derek needs the commands to pull accounting data off a Slurm cluster that produces the numbers shown in the slides available for the meeting. Action Item on Miguel.
  • Response time during Christmas at CSCS will be limited as the site will be closed. CSCS is putting additional efforts to make the system even more reliable.

PSI

UNIBE-LHEP

  • No report.

UNIBE-ID

  • Some job errors due to storage problems. The cause of this issue were bad IB cables, mechanically damaged during the server room reconstruction.
    • Some cables replaced, the rest will get replaced in the next downtime on 19-12-12
  • ARC CE otherwise running smoothly

UNIGE

  • No report.

NGI_CH

  • Report on this ticket:
    REFERENCE LINK: https://ggus.eu/index.php?mode=ticket_info&ticket_id=144342
    SUBJECT: NGI_CH - November 2019 - RP/RC OLA performance

    such tickets are a “standard formulation”, we have received tons in the past, affecting all sites, due to the fact that the ops probes failures go inevitably undetected, when these do not affect the production experiments. In this specific case, it is the first time the ticket has been also notified to the site. In the past, it was just assigned to the NGI_CH, so only I would receive notification. Then would do some investigation with the site, and report on the ticket. In some cases, Dario and Dino might remember, we never found the cause of some errors that appeared and went away on their own.


    I also see during that perios issues affecting the ARC CEs, but these went away spontaneously and it is no longer easy to investigate what happened back at failure times.

    To mitigate in the future, we have mentioned in the past that there exist the possibility of turning on notification at the site/service level in GOCDB. These will trigegr email to the GOCDB site contact in case some ops probes fail. Each site should choose their own matrix of notifications. There are two independent levels: site level (can be turned on by editing the main site page), and service level (can be turned on by editing each servic page)


  • NGI-CH Open Tickets review

Other topics

Next meeting date: Jan 09, 2020 at 14:00 Zurich time. Same Zoom connection details.

A.O.B.

  • Mauro points that ATLAS is not running a lot recently, Nick informs him that this is due to fair-share catching up because LHCb did not run in the last weeks. This could potentially be a problem due to how ATLAS workflows are, which penalisesites that show peaks. Mauro wants to know whether we can set QoS in a way that VOs always get a minimum and maximum chunk of the resources available. The answer is that we can, but the CSCS cannot be accountable for the number of CPU hours lost if there are no jobs in the queue. A possible alternative would be to tune priorities.

  • Vinzenz: Dino and Dario are out on vacation. We suggest to write emails to grid@cscs.ch instead of to personal people. Some CMS people seem to have problems accessing files at the T2, so Vinzenz will open a ticket to grid-rt@cscs.ch so CSCS can follow up.

Attendants

  • CSCS: Nicholas Cardo, Miguel Gila, Gianni Ricciardi
  • CMS: Derek Feichtinger, Vinzenz Stampf
  • ATLAS:
  • LHCb:
  • EGI:
  • CHIPP: Christoph Grab, Mauro Donega

Topic attachments
I Attachment History Action SizeSorted ascending Date Who Comment
PDFpdf CHIPPreportNov2019.pdf r1 manage 879.5 K 2019-12-05 - 13:01 NickCardo CSCS November Report
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2019-12-12 - MiguelGila
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback