Swiss Grid Operations Meeting on 2021-05-20 at 14:00

Zoom link:
- https://ethz.zoom.us/j/63018843316

Next meeting: 17.06 @ 14h00 <<-- as usual we can change the date to maximize the attendance (in particular the one of the VOreps)

Minutes:

ATLAS report

Nick: The numbers presented are computed considering 6910 expected slots while the number should read 5963. Can ATLAS explain where is 6910 coming from ? ATLAS Pledge = 74240 HS06. Cores = Pledge/HS06 = 74240/12.45=5963 cores. April KHS06 pledge hours = Pledge * HoursInDay * DaysInMonth / 1000 = 74240 * 24 * 30 / 1000 = 53452.8. Per CRIC Generated = 57757.977. This means pledge for April was exceeded.

Mauro: the numbers reported from Gianfranco and from Nick are constantly off. We need to converge once and for all on a common source and stick to that to avoid wasting everybody's time and energy in trying to match them.

Gianfranco: Simple and not obscure MATH, I am surprised questions like this arise and the chair lets them arise and bugs me about them (but I have seen even worse): CRIC pledge / HS06 coefficient => Number of cores. Simple MATH. Please NOTE: no private pledge numbers have any role in ATLAS/WLCG. Do your own private scaling among yourselves please, my time is as much wasted as it is yours, or even more to be dealing with such petty issues. CSCS is at 95% (with the usual overestimation error folded in) of pledge for April 2021. That is not tragic, but is NOT above 100%. In order to have numbers match, it is sufficient not to have private versions of the relevant metrics.

CSCS report

New monitoring up from mid-April (spikes are coming from http timeouts - fixing it)

All VOs are above 100%: we are using 10 extra nodes to cope for possible downtimes. Extrapolating from the load we have so far, May should be still above 100% inspite of the problems occurred when coming out from the maintenance period.

ATLAS DPM migration

From the answer in the minutes of the 11.03.2021 meeting:

What is the plan and timeline to move from test phase to production? (Gianfranco: will follow WLCG/ATLAS additional recommendations. More solutions need to be evaluated)
How much of the ATLAS workload is using DPM at CSCS? (Gianfranco: the DPM capacity at CSCS is 11% of the ATLAS storage for UNIBE-LHEP)

This still doens't answer how much of the workload is using it / is there anybody using it ?

Gianffranco: ATLAS is using it. It has been reported MULTIPLE times. If there is not an understanding about how experiments use storage at sites, you could set a workshop up for that. Then yuou could also perhaps report "how much of the workload is using" the ATLAS storage at CSCS.

DCACHE hw has started to arrive —> turn OFF access to test-DPM on Jun 2nd

Mauro: I would like to understand what happens on the ATLAS workflows when the test-HW is removed

Gianfranco: Please note: CSCS have insisted for years that all communication about r&d projects must occur via CHIPP. This is no exception. As such, I have written to the CHIPP chair asking for an official and recorded communication. Should CSCS want to end the ongoing production project despite its success: Send an official communication to ATLAS CH (e.g. me) , including a brief motivation so that we can pass that to the upstream. Following an handshake, we will arrange for the ATLAS data migration away from CSCS. This must be scheduled.

In addition, NOTE: Arbitrarily removing access to data will "have consequences". Outside of the private version of WLCG that is being showcased here with such random shoutouts (and has no precedents in the history of the LHC experiments), ATLAS data belong to ATLAS. And ATLAS service providers are bound to adhere to the rules and code of conduct, not to mention MoU of the official WLCG. Not to an arbitrary and private CHIPP version of it.

CMS report

nothing major, now waiting the system to come back

GPU work is scheduled to begin next week

LHCb report

nothing major, waiting for the system to come back

Swiss Grid Operations Meeting on 2021-05-20 at 14:00

Followup from previous Action Items

Action items

ATLAS

ATLAST2ReportApr2021.pdf: ATLAS CH Tier2 report

CMS

LHCb

T2 Sites reports

CSCS

CHIPPreportMay2021.pdf: CSCS Site Report

UniBe

CH-ATLAST2Report.pdf: CH-ATLAS Tier2 report

T3 Sites reports

PSI

UniGe

EGI / WLCG

EGI monthly report for April ok kfor NGI-CH
The following issue reported in March and April has not yet been solved:

CSCS squids (cvmfs and cvmfs1) are not working as needed. Should be fixed as in such configuration it is like they are not there in the first place (no caching)
(email by Ilija.Vukotic@cern.ch on 16th March 2021):

"If I look at wlcg squid monitoring:

http://wlcg-squid-monitor.cern.ch/snmpstats/mrtgatlas2/indexatlas2.html

Review of open tickets

https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO

Ticket-ID	VO	Site	Priority	Resp. Unit	Status	Last Update	Subject	Scope
152076	atlas	UNIBE-LHEP	less urgent	NGI_CH	assigned	2021-05-20	Job failures at UNIBE-LHEP	WLCG
152070	cms	CSCS-LCG2	urgent	NGI_CH	assigned	2021-05-20	SAM tests failing at T2_CH_CSCS	WLCG
151997	cms	CSCS-LCG2	urgent	NGI_CH	assigned	2021-05-14	WebDAV protocol deployed (T2_CH_CSCS)	WLCG
152033	cms	CSCS-LCG2	urgent	NGI_CH	in progress	2021-05-20	Erroneous consistency check endpoint at ...	WLCG
150373	dune	UNIBE-LHEP	less urgent	NGI_CH	in progress	2021-05-14	Enable DUNE queue for CPU and future ...	EGI
144485	none	CSCS-LCG2	less urgent	NGI_CH assigned	in progress	2021-04-14	Upgrade to recent dCache release	EGI
151265	cms	CSCS-LCG2	less urgent	NGI_CH	on hold	2021-04-09	Enabling WebDAV on Production ...	WLCG

a.o.b

Attendants
CSCS: Colin, Dario, Nick, Pablo
CMS: Derek, Mauro
ATLAS:
LHCb: Roland
EGI:

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
pdf	ATLAST2ReportApr2021.pdf	r1	manage	429.7 K	2021-05-20 - 10:27	GianfrancoSciacca	ATLAS CH Tier2 report
pdf	CH-ATLAST2Report.pdf	r1	manage	139.0 K	2021-05-20 - 10:27	GianfrancoSciacca	CH-ATLAS Tier2 report
pdf	CHIPPreportMay2021.pdf	r2 r1	manage	1009.7 K	2021-05-20 - 13:23	NickCardo	CSCS Site Report
pdf	wlcg-swissops-cms-20210520.pdf	r1	manage	3575.1 K	2021-05-20 - 11:59	DerekFeichtinger	CMS report