<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE =
TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME =
TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW =
TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
Swiss Grid Operations Meeting on 2021-05-20 at 14:00
Next meeting: 17.06 @ 14h00 <<-- as usual we can change the date to maximize the attendance (in particular the one of the VOreps)
Minutes:
ATLAS report
Nick: The numbers presented are computed considering 6910 expected slots while the number should read 5963. Can ATLAS explain where is 6910 coming from ? ATLAS Pledge = 74240 HS06. Cores = Pledge/HS06 = 74240/12.45=5963 cores. April KHS06 pledge hours = Pledge *
HoursInDay *
DaysInMonth / 1000 = 74240 * 24 * 30 / 1000 = 53452.8. Per CRIC Generated = 57757.977. This means pledge for April was exceeded.
Mauro: the numbers reported from Gianfranco and from Nick are constantly off. We need to converge once and for all on a common source and stick to that to avoid wasting everybody's time and energy in trying to match them.
CSCS report
New monitoring up from mid-April (spikes are coming from http timeouts - fixing it)
All VOs are above 100%: we are using 10 extra nodes to cope for possible downtimes. Extrapolating from the load we have so far, May should be still above 100% inspite of the problems occurred when coming out from the maintenance period.
ATLAS DPM migration
From the answer in the minutes of the 11.03.2021 meeting:
- What is the plan and timeline to move from test phase to production? (Gianfranco: will follow WLCG/ATLAS additional recommendations. More solutions need to be evaluated)
- How much of the ATLAS workload is using DPM at CSCS? (Gianfranco: the DPM capacity at CSCS is 11% of the ATLAS storage for UNIBE-LHEP)
This still doens't answer how much of the workload is using it / is there anybody using it ?
DCACHE hw has started to arrive —> turn OFF access to test-DPM on Jun 2nd
Mauro: I would like to understand what happens on the ATLAS workflows when the test-HW is removed
CMS report
nothing major, now waiting the system to come back
GPU work is scheduled to begin next week
LHCb report
nothing major, waiting for the system to come back
Followup from previous Action Items
Action items
ATLAS
CMS
LHCb
T2 Sites reports
CSCS
T3 Sites reports
PSI
EGI / WLCG
Review of open tickets
https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO
Ticket-ID |
Type |
VO |
Site |
Priority |
Resp. Unit |
Status |
Last Update |
Subject |
Scope |
152076 |
|
atlas |
UNIBE-LHEP |
less urgent |
NGI_CH |
assigned |
2021-05-20 |
Job failures at UNIBE-LHEP |
WLCG |
151265 |
|
cms |
CSCS-LCG2 |
less urgent |
NGI_CH |
on hold |
2021-04-09 |
Enabling WebDAV on Production ... |
WLCG |
150373 |
|
dune |
UNIBE-LHEP |
less urgent |
NGI_CH |
in progress |
2021-05-14 |
Enable DUNE queue for CPU and future ... |
EGI |
144485 |
|
none |
CSCS-LCG2 |
less urgent |
NGI_CH assigned |
in progress |
2021-04-14 |
Upgrade to recent dCache release |
EGI |
152070 |
|
cms |
CSCS-LCG2 |
urgent |
NGI_CH |
assigned |
2021-05-20 |
SAM tests failing at T2_CH_CSCS |
WLCG |
152033 |
|
cms |
CSCS-LCG2 |
urgent |
NGI_CH |
in progress |
2021-05-20 |
Erroneous consistency check endpoint at ... |
WLCG |
151997 |
|
cms |
CSCS-LCG2 |
urgent |
NGI_CH |
assigned |
2021-05-14 |
WebDAV protocol deployed (T2_CH_CSCS) |
WLCG |
a.o.b
- Attendants
- CSCS: Colin, Dario, Nick, Pablo
- CMS: Derek, Mauro
- ATLAS:
- LHCb: Roland
- EGI: