Swiss Grid Operations Meeting on 2021-12-16 at 9:00
*Next meeting:
Minutes:
ATLAS
difficult month: load changed a lot because of the workloads / huge mc samples to opportunistic 266% translating into heavy rereco back to CSCS
Storage review for dCache needed by the end of Dec.
Reprocessing rel21 to rel22 data+mc rereco and asking run2+3 to rerun
CMS
difficult month on T3 at PSI
SAM jobs failing with CSCS in waiting room
LHCb
The changes introduced in the last weeks seem to have solved the stability problems for LHCb. The jobs run most of the time now with an efficiency of 99% or even higher.
Miguel also managed to configure the job submission to run from the native openSUSE system, where LHCb starts a singularity image directly from CVMFS. This reduces another layer of software and seems to work without problems too.
CSCS
Several ATLAS/CMS penfing jobs: net interruption and files cleaning
Resulted in 27% wasted cycles
Temporary solution waiting for more space:
capped #CPU overcompensated by the higher efficiency because of the use of 50TB scratch space
Mont Fort new name for the virtual cluster for WLCG:
- testing different nodes configurations
Followup from previous Action Items
Action items
ATLAS
-
- ATLAST2Report-Nov2021.pdf: ATLAS CH Tier2 report
- ADC-ICB-18.10.21-resources.pdf: ADC resource reporting
- RRB values for CSCS CPU in 2021: 74.24k HS06 => GS clarification: Not sure who has added this and what it means, and it is not part of the ATLAS report: there is NOT such a thing as "RRB values for CSCS" for ATLAS. Please refrain from using this term once more in ATLAS report sections
- RRB values for CSCS Disk in 2021: 2574 TB => GS clarification: Not sure who has added this and what it means, and it is not part of the ATLAS report: there is NOT such a thing as "RRB values for CSCS" for ATLAS. Please refrain from using this term once more in ATLAS report sections
CMS
LHCb
T2 Sites reports
CSCS
UNIBE
-
- LHEP:
- Operation is very stable since a couple of months
- Lustre extension (+35% HDDs) is expected to be delivered in February 2021
- UBELIX:
- Stable operation, a bit more spotty CPU delivery
- UNIGE:
- Attracted lots of analysis, but cluster is behind NAT and there is a 1GB bottleneck (NIC on the NAT server) that chocks transfers and causes a lot of failures
- Took the queue offline while waiting for an upgrade
- Accounting ATLAS dashboard vs slurm: within 1%
- Accounting CRIC vs slurm: -11% for WallClock, -21% for HS06
- Accounting CRIC vs ATLAS DDM storage: -5% for installed, -1% for used
T3 Sites reports
PSI
EGI / WLCG
N.T.R.
Review of open tickets
https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO
3 of 3 Tickets |
Ticket-ID |
Type |
VO |
Site |
Priority |
Resp. Unit |
Status |
Last Update |
Subject |
Scope |
154858 |
|
cms |
T3_CH_PSI |
urgent |
NGI_CH |
waiting for reply |
2021-12-10 |
TPC WebDAV protocol deployment T3_CH_PSI |
EGI |
154102 |
|
dune |
UNIBE-LHEP |
less urgent |
NGI_CH |
in progress |
2021-12-14 |
Local accounting for DUNE jobs at ... |
EGI |
150373 |
|
dune |
UNIBE-LHEP |
less urgent |
NGI_CH |
waiting for reply |
2021-12-13 |
Enable DUNE queue for CPU and future ... |
EGI |
- Attendants
- CSCS:
- CMS:
- ATLAS:
- LHCb:
- EGI: