<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
Swiss Grid Operations Meeting on 2018-11-08 at 14:00
- Place: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598)
- External link: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE
- Phone gate: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign)
- IRC chat: irc:gridchat.cscs.ch:994#lcg (ask pw via email)
- Switch Vidyo SIP IP: 137.138.248.204
Site status
CSCS
PSI
UNIBE-LHEP
- A bit less stable (lack of manpower), lower delivery for a few months, still fulfilling the pledge.
- Ubelixed dropped out silently on 10th October
- Running an average <1900 slots (typical 2500), Ubelix contribution 12% (typical 23%)
- Large t2k.org run in September, 1 cluster reserved for a local user for almost the entire month
- Accounting numbers (from scheduler) from last month (October), LHEP only
VO | Job Type | Produced WC core-hours | | |
ATLAS | Any | 1157991 | | |
ops | Any | 44 | | |
t2k.org | Any | 0 | | |
uboone | Any | 0 | |
|
- Five month history Unibe (pledge: 18 kHS06)
- Swiss ATLAS statistics
-
- HC availability [1]:
- CSCS running 3300 slots on average, UNIBE running 1850
- Accounting numbers (from dashboard) from last month for CSCS and UNIBE
[1]
http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562#time=custom&start_date=2018-10-01&end_date=2018-10-31&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE
UNIBE-ID
- Enabled EGI ARGO notification e-mails in GOCDB to respond to CE stalling silently
- Opportunistic usage on Ubelix to be added as soon as the sl6 legacy partition will be discontinued
- slurm pre-emptable partition
- ATLAS can use idle slots
- ATLAS jobs killed (not checkpointed) when slots needed by other users
UNIGE
- Re-commissioning of ARC CE delayed
- Distrtibuted DPM storage working well
NGI_CH
- Our deal with EGI for certificates expires in March 2019
- Science IT support Bern is looking into what the alternative will be
* NGI-CH Open Tickets review
Ticket-ID |
Type |
VO |
Site |
Priority |
Resp. Unit |
Status |
Last Update |
Subject |
Scope |
138296 |
|
cms |
CSCS-LCG2 |
urgent |
NGI_CH |
assigned |
2018-11-14 |
Transfers failing from T2_CH_CSCS |
WLCG |
132927 |
|
cms |
CSCS-LCG2 |
urgent |
NGI_CH assigned involved |
in progress |
2018-11-12 |
Problem with APEL Accounting for all of ... |
EGI |
131965 |
|
none |
UNIBE-LHEP |
less urgent |
NGI_CH assigned |
on hold |
2018-10-04 |
IPv6 deployment at WLCG Tier-2 sites |
EGI |
131948 |
|
none |
CSCS-LCG2 |
less urgent |
NGI_CH assigned |
in progress |
2018-11-13 |
IPv6 deployment at WLCG Tier-2 sites |
EGI |
138314 |
|
atlas |
CSCS-LCG2 |
less urgent |
NGI_CH |
assigned |
2018-11-15 |
DE CSCS-LCG2 : transfer failures with ... |
WLCG |
133695 |
|
lhcb |
CSCS-LCG2 |
urgent |
NGI_CH assigned |
in progress |
2018-10-19 |
Data access problem at CSCS-LCG2 |
WLCG |
Other topics
- Follow up to fair-share meeting
- Two questions, one for the slurm experts, one for the VO reps:
- is slurm charging the reserved time or the elapsed*cores time to the user fair-share?
- NICK: no, it is using (endtime-starttime)*cores
- possible mitigation: pack single core jobs on nodes, as opposed to distribute them across all nodes. How does this sound?
- this should reduce the node fragmentation and give the MC jobs more opportunities to run timely
- NICK: cannot comment at the moment, will look at it
- Other possible mitigations to be discussed internally between VOs need input from CSCS:
- Distribution of job queue waiting time, last 2 Quarters, split by: Daint vs Phoenix, VO and 8-core vs 1-core (we should exclude from these plots the T0 jobs)
- NICK: CSCS will investigate providing queue wait time reporting
- Anything else?
- NICK: Move forward with Stefano’s recommendation on Tuesday for a face-to-face meeting, preferably before the end of the year
- Can we agree that the Daint and Phoenix shares (30 or 60 day historical view) will be monitored monthly at this meeting?
- GIANFRANCO: not discussed
- Topic2
...
Next meeting date:
A.O.B.
Attendants
- CSCS:
- CMS:
- ATLAS:
- LHCb:
- EGI:
Action items
This topic: LCGTier2
> WebHome >
MeetingsBoard > MeetingSwissGridOperations20181108
Topic revision: r8 - 2018-11-16 - GianfrancoSciacca