Swiss Grid Operations Meeting on 2021-03-11 at 14:00
Next meeting: 15 April 2021 @ 14h30 (note unusual time)
Minutes:
- minor correction to the ATLAS expected #slots(cores) 6000 should be 5930
- ATLAS was sending its jobs preferencially onthe small partition. With the merged partitions the system is better behaved
(Comment by Gianfranco: ATLAS load balancing submitter does not prefer any endpoint. The backlog is based on queued+running on each endpoint. Having two largely unbalanced partitions caused the effect described, not ATLAS "sending its jobs preferencially onthe small partition")
- On March 8 we completed CHIPP pledges
- On March 3, 4 , 8 no pending jobs from all VOs. To be investigated. It seems suspicious that all three experiments had issues in those days. At first sight Derek doesn’t see anything from the CMS factory plots. Roland says LHCb had a dip on March 8. ATLAS ?
(Comment by Gianfranco: the report covers February)
- At the moment ATLAS is not sending enough jobs. It may miss the pledge by a tiny amount. Since about December 1, the number of pending jobs is halved, why ?Would it be possible to send a higher number of jobs ?
(Comment by Gianfranco: checked last 6 months: 1328 average pending jobs at the site. This is
too many. Many need to be cancelled because of waiting in the queue too long and redirected to other sites)
- While the overall CHIPP pledge is completed LHCb is clearly above its pledges, while ATLAS and CMS are a bit below. --> Mauro sends an email to the CHIPP steering board to asking if they want to reduce the LHCb “priority” for the rest of the month and let the other experiments catch up.
Questions to ATLAS DPM Migration:
- What is the plan and timeline to move from test phase to production? (Gianfranco: will follow WLCG/ATLAS additional recommendations. More solutions need to be evaluated)
- How much of the ATLAS workload is using DPM at CSCS? (Gianfranco: the DPM capacity at CSCS is 11% of the ATLAS storage for UNIBE-LHEP)
REMINDER TO ALL EXPERTIMENTS: INCREASE JOBS SUBMISSIONS FROM NEXT MONTH ONWARDS ACCORDING TO THE NEW PLEDGES
(Comment by Gianfranco: submission is NOT manual. It is based on current usage+pending values at every site)
Followup from previous Action Items
Action items
ATLAS
CMS
LHCb
T2 Sites reports
CSCS
T3 Sites reports
PSI
EGI / WLCG
- EGI A/R report for February looks good: NGI_CH A/R 98.96/99.41
- Site-BDII metrics org.bdii.Entries and org.bdii.Freshness removed from ARGO_MON_CRITICAL profile
- the metrics are still kept in the ARGO_MON_OPERATORS profiles
- it is still an important service to support infrastructure oversight activities
- There will soon be a campaing for updating the ARC CE to 10.6.2 (with fix for GDPR). In principle this can be done now but might break SAM EFT probes. Probably better to wait until the Condor submitters are fixed
Review of open tickets
4 of 4 Tickets |
Ticket-ID |
Type |
VO |
Site |
Priority |
Resp. Unit |
Status |
Last Update |
Subject |
Scope |
150484 |
|
cms |
CSCS-LCG2 |
less urgent |
NGI_CH |
in progress |
2021-03-10 |
enabling AREX service at ARC-CE(s) at ... |
WLCG |
150373 |
|
dune |
UNIBE-LHEP |
less urgent |
NGI_CH |
in progress |
2021-03-05 |
Enable DUNE queue for CPU and future ... |
EGI |
149166 |
|
cms |
CSCS-LCG2 |
less urgent |
NGI_CH assigned |
in progress |
2021-02-15 |
CVMFS squids at CSCS |
WLCG |
144485 |
|
none |
CSCS-LCG2 |
less urgent |
NGI_CH assigned |
in progress |
2020-12-18 |
Upgrade to recent dCache release |
EGI |
a.o.b
- Attendants
- CSCS: Nick, Pablo, Dario, Elia, Colin
- CMS: Derek, Mauro
- ATLAS:
- LHCb: Roland
- EGI: