<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

Swiss Grid Operations Meeting on 2020-02-20 at 15:30

Followup from previous Action Items

Action items

VO reports

ATLAS

Minutes:
  • the pledges were achieved (surpassed) in the past month
  • while the pledged performance were achieved, shortfall on the 40:40:20 share / see Nick slides to understand why the 40:40:20 was difficult to achieve last month
  • 15-20% of the 1-core EVGEN jobs timeout w/o logs. Debugging with Miguel not cocnlusive yet. Impossible to save the session for all jobs from ATLAS--> try to do it for single core production jobs --> then debug
  • split the monitoring plot of "Slots of running jobs" in two plots - just not to have the stacked plots. The total pledged is automatically added in the plots, add by hand a dotted line with the CSCS-only pledge
  • All dips in the plots for both CSCS and Bern discussed and understood
  • The color coding in the table "ATLAS T2-statistics" is red < 75% < yellow < 90%< green
  • The color coding in the table "ATLAS Hammercloud statistics" is red < 95% < green
  • Some details about the Swiss ATLAS federation will be send in the next days

CMS

Minutes:

  • a ticket has been filed to understand why CMS had so few pending jobs (the initial explanation is that there were no jobs to run in the whole CMS which sounds surprising)

LHCb

Minutes:

  • LHCb all OK
  • 2 misconfigurations occurred on the LHCb side
  • affected by the SEF(?) downtime
  • IP6 problem solved

T2 Sites reports

CSCS

January utilization:
  • Pledges
    • 112.2% CHIPP overall
    • 104.1% ATLAS
    • 101.5% CMS
    • 149.9% LHCb
  • Sharing
    • ATLAS:CMS:LHCb [%] 37:36:27
Minutes:
  • discussion of the dried out queues
  • Whenever a VO does not submit jobs (e.g. by watching the grafana page) warn CSCS and try to debug the case as much in real time as possible
  • ATLAS has a procedure to drain its queue for scheduled downtimes. This explains why the number of jobs went down before the actual maintenance. After the maintenance the jobs where back in a few ours as expected. While this is not necessarily a problem per se, the mechanism has an impact on the capability to reach the pledges.
  • We badly need to improve the communnication flow between VOs and CSCS. Try with:
    • Hot topics page (see Hot topics page) to collect changes to the system that can possibly affect operations
    • Slack-chat in case special "real time" debugging is needed while the issue is in progress
    • tickets to flag problems

UNIBE LHEP/Ubelix

Minutes:
  • no particular discussion

T3 Sites reports

PSI

  • Conitinue migration of all T3 nodes to rhel7: Slurm clients were done
  • T3 downtime day for Storage Upgrade. Due to good preparation and testing the following upgrades were completed:
    • dCache servers OS from sl6 to rhel7
    • dCache from 3.2 to 5.2
    • Postgresql from 9.5 to 11
    • Firmware on Dalco storage pools and NetApp

  • There was storage discussion at T3 regarding POSIX perspective solutions to replace egeing hardware (40TB ZFS on Linux). So that here is my review summary concerning usage of dCache as NFS4.1 server: still looks like a bleeding edge development, needs constantly to update dCache and configurations.

UniGE

  • Can now request grid certificates via the Swiss CA
  • DPM pools upgraded to the latest version in line with the Bern ones
  • ARC CE deployment delayed, will have a meeting next Monday to outline the final steps

EGI

News

  • NTR

Review of open tickets

  • https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO

    8 Tickets found
    Ticket-ID Type VO Site Priority Resp. Unit Status Last Update Subject Scope
    145582 cms CSCS-LCG2 urgent NGI_CH in progress 2020-02-19 T2_CH_CSCS is intermittently failing ... WLCG
    144898 cms CSCS-LCG2 less urgent NGI_CH in progress 2020-01-22 T2_CH_CSCS warning - outdated version ... WLCG
    144499 none T3_CH_PSI less urgent NGI_CH assigned 2020-02-18 Upgrade to recent dCache release EGI
    144485 none CSCS-LCG2 less urgent NGI_CH assigned 2020-02-04 Upgrade to recent dCache release EGI
    143464 none UNIBE-LHEP urgent NGI_CH in progress 2020-02-20 DPM at UNIBE-LHEP has to be configured ... EGI
    141276 none less urgent NGI_CH assigned on hold 2019-11-26 yearly review of the information ... EGI
    131965 none UNIBE-LHEP less urgent NGI_CH assigned on hold 2020-01-20 IPv6 deployment at WLCG Tier-2 sites EGI
    131432 none CSCS-LCG2 urgent NGI_CH assigned involved in progress 2020-01-27 Storage accounting deployment EGI

a.o.b

  • Summary of the discussion at CSCS on 10.02.2020 20200220_OpsMeeting.pdf
  • Next meeting date: 05.03.2020 --> ATLAS GPU challenge + discussion of the slides above

Attendants

  • CSCS: Nick, Pablo, Dino, Dario
  • CMS: Mauro, Christoph, Derek
  • ATLAS: Gianfranco
  • LHCb:Roland
  • EGI: Gianfranco
Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf 20200220_OpsMeeting.pdf r1 manage 47.3 K 2020-02-20 - 12:36 MauroDonega  
PDFpdf ATLAST2reportJan2020pdf.pdf r1 manage 403.0 K 2020-02-20 - 13:57 GianfrancoSciacca ATLAS CH Tier2 report
PDFpdf CHIPPreportJan2020.pdf r2 r1 manage 5618.4 K 2020-02-19 - 07:03 NickCardo CSCS January Site Report
PDFpdf UnibeT2ReportJan2020.pdf r1 manage 186.7 K 2020-02-20 - 13:58 GianfrancoSciacca UniBE Tier2 report
Edit | Attach | Watch | Print version | History: r14 | r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 2020-02-21 - MauroDonega
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback