Swiss Grid Operations Meeting on 2021-03-11 at 14:00

Next meeting: 15 April 2021 @ 14h30 (note unusual time)


- minor correction to the ATLAS expected #slots(cores) 6000 should be 5930

- ATLAS was sending its jobs preferencially onthe small partition. With the merged partitions the system is better behaved

(Comment by Gianfranco: ATLAS load balancing submitter does not prefer any endpoint. The backlog is based on queued+running on each endpoint. Having two largely unbalanced partitions caused the effect described, not ATLAS "sending its jobs preferencially onthe small partition")

- On March 8 we completed CHIPP pledges

- On March 3, 4 , 8 no pending jobs from all VOs. To be investigated. It seems suspicious that all three experiments had issues in those days. At first sight Derek doesn’t see anything from the CMS factory plots. Roland says LHCb had a dip on March 8. ATLAS ?

(Comment by Gianfranco: the report covers February)

- At the moment ATLAS is not sending enough jobs. It may miss the pledge by a tiny amount. Since about December 1, the number of pending jobs is halved, why ?Would it be possible to send a higher number of jobs ?

(Comment by Gianfranco: checked last 6 months: 1328 average pending jobs at the site. This is too many. Many need to be cancelled because of waiting in the queue too long and redirected to other sites)

- While the overall CHIPP pledge is completed LHCb is clearly above its pledges, while ATLAS and CMS are a bit below. --> Mauro sends an email to the CHIPP steering board to asking if they want to reduce the LHCb “priority” for the rest of the month and let the other experiments catch up.

Questions to ATLAS DPM Migration:

  • What is the plan and timeline to move from test phase to production? (Gianfranco: will follow WLCG/ATLAS additional recommendations. More solutions need to be evaluated)
  • How much of the ATLAS workload is using DPM at CSCS? (Gianfranco: the DPM capacity at CSCS is 11% of the ATLAS storage for UNIBE-LHEP)

(Comment by Gianfranco: submission is NOT manual. It is based on current usage+pending values at every site)

Followup from previous Action Items

Action items




T2 Sites reports



T3 Sites reports




  • EGI A/R report for February looks good: NGI_CH A/R 98.96/99.41

  • Site-BDII metrics org.bdii.Entries and org.bdii.Freshness removed from ARGO_MON_CRITICAL profile
    • the metrics are still kept in the ARGO_MON_OPERATORS profiles
    • it is still an important service to support infrastructure oversight activities

  • There will soon be a campaing for updating the ARC CE to 10.6.2 (with fix for GDPR). In principle this can be done now but might break SAM EFT probes. Probably better to wait until the Condor submitters are fixed

Review of open tickets

4 of 4 Tickets
Ticket-ID Type VO Site Priority Resp. Unit Status Last Update Subject Scope
150484   cms CSCS-LCG2 less urgent NGI_CH in progress 2021-03-10 enabling AREX service at ARC-CE(s) at ... WLCG
150373   dune UNIBE-LHEP less urgent NGI_CH in progress 2021-03-05 Enable DUNE queue for CPU and future ... EGI
149166   cms CSCS-LCG2 less urgent NGI_CH assigned in progress 2021-02-15 CVMFS squids at CSCS WLCG
144485   none CSCS-LCG2 less urgent NGI_CH assigned in progress 2020-12-18 Upgrade to recent dCache release EGI
  • Attendants
  • CSCS: Nick, Pablo, Dario, Elia, Colin
  • CMS: Derek, Mauro
  • ATLAS:
  • LHCb: Roland
  • EGI:
