Swiss Grid Operations Meeting on 2022-05-12 at 09:00

*Next meeting:

F2F 2 June 2022



still investigating with EnhanceR how to deal with the fee and the 10% FTE


heavy under-deliver ~35% of the pledge in April

share under 19%

storage at bern decommissioned and integrated into nordugrid (resources are still physically in bern)


Mont Fort added to the CMS workflows

no open tickets

overall situation looks OK

A number of file transfer are failing: issue identify in the way davs manages the VOMS attributes in third party transfer. Only newly transferred files are added to the storage space, existing data cannot be added. This lead to a micromanagement of the free space. Just today we may have received a solution for the purpose of managing small reservation / deleting old data

CMS cache storage space usage: proposed a new space allocation to meet the required 75% required to be managed by cps-central

Tickets are proactively taken care from CSCS: thanks !


All fine on Daint. Usual 10% failing pilots that disappeared at the beginning of last month. Good but unclear what happened...

Mont Fort: some troubles to submit with the standard python script, but they figured out that the was a typo.


Daint: pretty bad month… confirming the numbers found by ATLAS (not so bad for the others)

Why ATLAS is so bad ? unclear but related to the reco jobs.

Mon Fort:

April 10 nodes with 128 cores 512 + 4 nvidia A100 + 10kHS06

May 32 nodes with 256 cores 512 GB/RAM + 90kHS06

Share: 66% of the resources to ATLAS to catch up + 15% / 15% CMS/LHCb

Temporary adding cluster on Mont Gele

same config as Mont Fort but with HDD=-based storage

Additional +90kHS06

ALPS down time in July to replace some network components suggest to keep Daint for a while with reduced capacity until July migration. Is that OK ?

Action items and points for discussion:

increase resources for TALAS on Mont Fort

Daint: around 20% of the resources (it won’t fix the reco jobs)

Summing the two should help

Must add asap Slurm and ARC monitoring on Mont Fort Mont Gele

If ATLAS sees reco jobs not running, it doesn’t send anything else because assumes that the site is full




T2 Sites reports



T3 Sites reports




  • New CERN Grid CA introduced on 25th April. dCache restart needed => done @CSCS
    • the following versions of dCache do not need a restart:
  • New benchmark replacing HEP-SPEC06

    • The benchmark HEPSCORE is going to replace the old HEP-SPEC06
    • Preparing plans with WLCG and the EGI Accounting team for deploying the new benchmark
    • There will be a transition period during which both the benchmark will be published by the sites and used to normalise the data to allow comparison between the two
    • APEL is working on a version where the accounting records contains 2 benchmarks
    • ARC needs to develop accordingly
    • Expected timescale for rollout: Q4 2022 (tentative)

Review of open tickets


4 of 4 Tickets
Ticket-ID Type VO Site Priority Resp. Unit Status Last Update Subject Scope
156799 lhcb CSCS-LCG2 very urgent NGI_CH in progress 2022-05-04 Pilots Failed at CSCS-LCG2 EGI
156213 atlas CSCS-LCG2 less urgent NGI_CH in progress 2022-03-07 CSCS-LCG2: Non-operational storage ... EGI
154102   dune UNIBE-LHEP less urgent NGI_CH on hold 2021-12-22 Local accounting for DUNE jobs at ... EGI
150373   dune UNIBE-LHEP less urgent NGI_CH on hold 2021-12-22 Enable DUNE queue for CPU and future ... EGI

  • Attendants
  • CSCS:
  • CMS:
  • ATLAS:
  • LHCb:
  • EGI:

