Swiss Grid Operations Meeting on 2018-03-01 at 14:00

Place: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598)
External link: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE
Phone gate: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign)
IRC chat: irc:gridchat.cscs.ch:994#lcg (ask pw via email)
Switch Vidyo SIP IP: 137.138.248.204

Swiss Grid Operations Meeting on 2018-03-01 at 14:00
- Site status
  - CSCS
  - PSI
  - UNIBE-LHEP
  - UNIBE-ID
  - UNIGE
  - NGI_CH
- Other topics
- A.O.B.
- Attendants
- Action items

Site status

CSCS

Piz Daint
- Maintenance operations last week:
- Moved all LHConCRAY compute nodes to the same cabinet (c9-0) along with all relevant service nodes (DVS and DataWarp). This should reduce the dependency/impact on the overall HSN of LHConCRAY workflows.
  - DVS client caching enabled (so far shows no improvements)
  - Increased DVS nodes from 5 to 8. Work in progress to get them operational.
  - DataWarp (swap) currently not working. Work in progress to get this fixed.
- Hot-topic: singularity.
  - VOs able to run in singularity containers using regular Tier-2 workflows.
  - However, this is rather a hack (ssh to breakout shifter container) and would like to have a better long-term solution.
  - Could VO jobs run on SLE12SP2 right until singularity gets called? CMS? ATLAS?
  - If so, need to tune CE entries and CVMFS caching (no preloaded cache for this, but likely a shared-rw cache across all nodes).
- Work happening:

* Deploying new ARC server for Dom, the TDS of Piz Daint.

* Deploying new ARC server for Dom, the TDS of Piz Daint. * Phoenix GPFS
- No major issues but the system is mostly overloaded
- Testing "expected" speed in order to understand the performance we have now
- Planning new hardware architecture

dCache
- Updated to the latest 2.16 release (2.16.60)
- CMS spacemon upload process is working well
- Did some fixes to the fetchcrl, workaround for linkgroup update bug
- Introducing Marco Passerini to the infrastructure
- Using the users mailing list instead of the ticket system * Updated Slurm 17.11.3-2 * General software update

- Installed singularity
Storage
- xxx
- yyy

Some statistics:

gra

PSI

UNIBE-LHEP

Stable operation for several months, no issues or immediate worries to report.

* Running an average of 2400 slots, Ubelix contribution ~20%

* Running an average of 2400 slots, Ubelix contribution ~20% * Accounting numbers (from scheduler) from last month

VO Job Type Produced WC core-hours

ATLAS Any 1184331

ops Any 20

t2k.org Any 72

uboone Any 0

* Planning to implement the event service workflow to fetch opportunistically spare slots on Ubelix

*
Accounting numbers (from dashboard) from last month for CSCS and UNIBE
* Accounting numbers (from scheduler) from last month

VO	Job Type	Produced WC core-hours
ATLAS	Any	1184331
ops	Any	20
t2k.org	Any	72
uboone	Any	0

http://dashb-atlas-job.cern.ch/dashboard/request.py/consumptions_individual?sites=UNIBE-LHEP&sitesCat=All%20Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=null&end=null&timeRange=lastMonth&granularity=Hourly&generic=0&sortBy=16&series=All&type=ewa

* HC availability [1]: * * CSCS-LCG2: 97% Prod, 97% Analy

- CSCS-LCG2-HPC: 91% Prod, 91% Analy
- UNIBE-LHEP: 98% Prod, 95% Analy
- UNIBE-LHEP-UBELIX: 98% Prod, 97% Analy
CSCS running 3000 slots on average, UNIBE running 2400

Accounting numbers (from dashboard) from last month for CSCS and UNIBE

* ~~| Cluster | Job Type | Produced WC core-hours | Good vs Bad WC % | CPU eff good jobs % |~~ Accounting numbers (from dashboard) from last month for CSCS and UNIBE ~~| CSCS | Any | 2373724 (61%) | 0.62 | 0.58 |~~ | Cluster | Job Type | Produced WC core-hours | Good vs Bad WC % | CPU eff good jobs % | ~~| CSCS | Any | 2373724 (61%) | 0.62 | 0.58 |~~ | CSCS | Any | 2373724 (61%) | 0.84 | 0.58 | ~~| Unibe | Any | 1507416 (39%) | 0.94 | 0.80 |~~ | Unibe | Any | 1507416 (39%) | 0.93 | 0.80 |

~~| Unibe | Any | 1507416 (39%) | 0.94 | 0.80 |~~

[1] http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562#time=custom&start_date=2017-07-01&end_date=2018-01-23&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2,CSCS-LCG2-HPC,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE

UNIBE-ID

* Stable delivery for ATLAS * Xxx

* Planning to implement the event service workflow to fetch opportunistically spare slots on Ubelix

UNIGE

Xxx
Accounting numbers (from scheduler) from last month

NGI_CH

* Nothing of relevance to report

* Xxx * NGI-CH Open Tickets review

* NGI-CH Open Tickets review:

https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO

Ticket-ID	VO	Site	Priority	Resp. Unit	Status	Last Update	Subject	Scope
131353	atlas	UNIGE-DPNC	urgent	NGI_CH involved	in progress	2018-02-24	Problem getrting data from UNIGE-DPNC	WLCG
131433	none	T3_CH_PSI	less urgent	NGI_CH assigned	in progress	2018-01-15	Storage accounting deployment	EGI
131435	none	UNIGE-DPNC	less urgent	NGI_CH involved	on hold	2018-01-24	Storage accounting deployment	EGI
131948	none	CSCS-LCG2	less urgent	NGI_CH assigned	assigned	2018-01-22	IPv6 deployment at WLCG Tier-2 sites	EGI
131965	none	UNIBE-LHEP	less urgent	NGI_CH	on hold	2017-12-14	IPv6 deployment at WLCG Tier-2 sites	EGI
132927	cms	CSCS-LCG2	urgent	NGI_CH involved	in progress	2018-02-16	Problem with APEL Accounting for all of ...	EGI
133480	none	CSCS-LCG2	urgent	NGI_CH	in progress	2018-02-19	setup of /store/test/rucio	WLCG
133689	atlas	UNIGE-DPNC	urgent	NGI_CH	in progress	2018-03-01	DE IEPSAS-KOSICE DATADISK transfer ...	WLCG
133695	lhcb	CSCS-LCG2	urgent	NGI_CH	in progress	2018-02-26	Data access problem at CSCS-LCG2	WLCG

A.O.B.

Attendants

CSCS:
CMS:
ATLAS:
LHCb:
EGI:

Action items

* Item1 * Item1 * Item1 * Item1

This topic: LCGTier2 > WebHome > MeetingsBoard > MeetingSwissGridOperations20180301
Topic revision: r6 - 2018-03-01 - DinoConciatore