Swiss Grid Operations Meeting on 2021-12-16 at 9:00

Zoom link:
- https://ethz.zoom.us/j/67184143478

*Next meeting:

Swiss Grid Operations Meeting on 2021-12-16 at 9:00

Minutes:

ATLAS

difficult month: load changed a lot because of the workloads / huge mc samples to opportunistic 266% translating into heavy rereco back to CSCS

Storage review for dCache needed by the end of Dec.

Reprocessing rel21 to rel22 data+mc rereco and asking run2+3 to rerun

CMS

difficult month on T3 at PSI

SAM jobs failing with CSCS in waiting room

LHCb

The changes introduced in the last weeks seem to have solved the stability problems for LHCb. The jobs run most of the time now with an efficiency of 99% or even higher.

Miguel also managed to configure the job submission to run from the native openSUSE system, where LHCb starts a singularity image directly from CVMFS. This reduces another layer of software and seems to work without problems too.

CSCS

Several ATLAS/CMS penfing jobs: net interruption and files cleaning

Resulted in 27% wasted cycles

Temporary solution waiting for more space:

capped #CPU overcompensated by the higher efficiency because of the use of 50TB scratch space

Mont Fort new name for the virtual cluster for WLCG:

- testing different nodes configurations

Followup from previous Action Items

Action items

ATLAS

- ATLAST2Report-Nov2021.pdf: ATLAS CH Tier2 report
- ADC-ICB-18.10.21-resources.pdf: ADC resource reporting
- RRB values for CSCS CPU in 2021: 74.24k HS06 => GS clarification: Not sure who has added this and what it means, and it is not part of the ATLAS report: there is NOT such a thing as "RRB values for CSCS" for ATLAS. Please refrain from using this term once more in ATLAS report sections
- RRB values for CSCS Disk in 2021: 2574 TB => GS clarification: Not sure who has added this and what it means, and it is not part of the ATLAS report: there is NOT such a thing as "RRB values for CSCS" for ATLAS. Please refrain from using this term once more in ATLAS report sections

CMS

LHCb

T2 Sites reports

CSCS

UNIBE

- LHEP:
  - Operation is very stable since a couple of months
  - Lustre extension (+35% HDDs) is expected to be delivered in February 2021
- UBELIX:
  - Stable operation, a bit more spotty CPU delivery
- UNIGE:
  - Attracted lots of analysis, but cluster is behind NAT and there is a 1GB bottleneck (NIC on the NAT server) that chocks transfers and causes a lot of failures
  - Took the queue offline while waiting for an upgrade
Accounting ATLAS dashboard vs slurm: within 1%
Accounting CRIC vs slurm: -11% for WallClock, -21% for HS06
Accounting CRIC vs ATLAS DDM storage: -5% for installed, -1% for used

T3 Sites reports

PSI

UniGe

EGI / WLCG

N.T.R.

Review of open tickets

https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO

3 of 3 Tickets
Ticket-ID	Type	VO	Site	Priority	Resp. Unit	Status	Last Update	Subject	Scope
154858		cms	T3_CH_PSI	urgent	NGI_CH	waiting for reply	2021-12-10	TPC WebDAV protocol deployment T3_CH_PSI	EGI
154102		dune	UNIBE-LHEP	less urgent	NGI_CH	in progress	2021-12-14	Local accounting for DUNE jobs at ...	EGI
150373		dune	UNIBE-LHEP	less urgent	NGI_CH	waiting for reply	2021-12-13	Enable DUNE queue for CPU and future ...	EGI

Attendants
CSCS:
CMS:
ATLAS:
LHCb:
EGI:

CSCST2ReportDec2021.pdf: CSCS T2 report

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
pdf	ADC-ICB-18.10.21-resources.pdf	r1	manage	151.7 K	2021-12-14 - 09:47	GianfrancoSciacca	ADC resource recording
pdf	ATLAST2Report-Nov2021.pdf	r1	manage	1182.9 K	2021-12-14 - 09:48	GianfrancoSciacca	ATLAS CH Tier2 report
pdf	CSCST2ReportDec2021.pdf	r1	manage	2422.6 K	2021-12-16 - 09:08	MiguelGila	CSCS T2 report