<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
Swiss Grid Operations Meeting on 2018-03-01 at 14:00
Site status
CSCS
- Piz Daint
- Maintenance operations last week:
- Moved all LHConCRAY compute nodes to the same cabinet (c9-0) along with all relevant service nodes (DVS and DataWarp). This should reduce the dependency/impact on the overall HSN of LHConCRAY workflows.
- DVS client caching enabled (so far shows no improvements)
- Increased DVS nodes from 5 to 8. Work in progress to get them operational.
- DataWarp (swap) currently not working. Work in progress to get this fixed.
- Hot-topic: singularity.
- VOs able to run in singularity containers using regular Tier-2 workflows.
- However, this is rather a hack (ssh to breakout shifter container) and would like to have a better long-term solution.
- Could VO jobs run on SLE12SP2 right until singularity gets called? CMS? ATLAS?
- If so, need to tune CE entries and CVMFS caching (no preloaded cache for this, but likely a shared-rw cache across all nodes).
- Work happening:
* Deploying new ARC server for Dom, the TDS of Piz Daint.
* Deploying new ARC server for Dom, the TDS of Piz Daint.
* Phoenix
GPFS
- No major issues but the system is mostly overloaded
- Testing "expected" speed in order to understand the performance we have now
- Planning new hardware architecture
dCache
- Updated to the latest 2.16 release (2.16.60)
- CMS spacemon upload process is working well
- Did some fixes to the fetchcrl, workaround for linkgroup update bug
- Introducing Marco Passerini to the infrastructure
- Using the users mailing list instead of the ticket system
* Updated Slurm 17.11.3-2
* General software update
Some statistics:
gra
PSI
UNIBE-LHEP
- Stable operation for several months, no issues or immediate worries to report.
* Running an average of 2400 slots, Ubelix contribution ~20%
* Running an average of 2400 slots, Ubelix contribution ~20%
* Accounting numbers (from scheduler) from last month
VO | Job Type | Produced WC core-hours | | |
ATLAS | Any | 1184331 | | |
ops | Any | 20 | | |
t2k.org | Any | 72 | | |
uboone | Any | 0 | |
|
* Planning to implement the event service workflow to fetch opportunistically spare slots on Ubelix
* Accounting numbers (from dashboard) from last month for CSCS and UNIBE
* Accounting numbers (from scheduler) from last month
VO | Job Type | Produced WC core-hours | | |
ATLAS | Any | 1184331 | | |
ops | Any | 20 | | |
t2k.org | Any | 72 | | |
uboone | Any | 0 | |
|
http://dashb-atlas-job.cern.ch/dashboard/request.py/consumptions_individual?sites=UNIBE-LHEP&sitesCat=All%20Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=null&end=null&timeRange=lastMonth&granularity=Hourly&generic=0&sortBy=16&series=All&type=ewa
* HC availability [1]:
*
* CSCS-LCG2: 97% Prod, 97% Analy
-
- CSCS-LCG2-HPC: 91% Prod, 91% Analy
- UNIBE-LHEP: 98% Prod, 95% Analy
- UNIBE-LHEP-UBELIX: 98% Prod, 97% Analy
- CSCS running 3000 slots on average, UNIBE running 2400
Accounting numbers (from dashboard) from last month for CSCS and UNIBE
*
| Cluster | Job Type | Produced WC core-hours | Good vs Bad WC % | CPU eff good jobs % |
Accounting numbers (from dashboard) from last month for CSCS and UNIBE
| CSCS | Any | 2373724 (61%) | 0.62 | 0.58 |
| Cluster | Job Type | Produced WC core-hours | Good vs Bad WC % | CPU eff good jobs % | | CSCS | Any | 2373724 (61%) | 0.62 | 0.58 | | CSCS | Any | 2373724 (61%) | 0.84 | 0.58 | | Unibe | Any | 1507416 (39%) | 0.94 | 0.80 | | Unibe | Any | 1507416 (39%) | 0.93 | 0.80 |
| Unibe | Any | 1507416 (39%) | 0.94 | 0.80 |
[1]
http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562#time=custom&start_date=2017-07-01&end_date=2018-01-23&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2,CSCS-LCG2-HPC,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE
UNIBE-ID
* Stable delivery for ATLAS
* Xxx
* Planning to implement the event service workflow to fetch opportunistically spare slots on Ubelix
UNIGE
- Xxx
- Accounting numbers (from scheduler) from last month
NGI_CH
* Nothing of relevance to report
* Xxx
* NGI-CH Open Tickets review
* NGI-CH Open Tickets review:
https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO
Ticket-ID |
Type |
VO |
Site |
Priority |
Resp. Unit |
Status |
Last Update |
Subject |
Scope |
131353 |
|
atlas |
UNIGE-DPNC |
urgent |
NGI_CH involved |
in progress |
2018-02-24 |
Problem getrting data from UNIGE-DPNC |
WLCG |
131433 |
|
none |
T3_CH_PSI |
less urgent |
NGI_CH assigned |
in progress |
2018-01-15 |
Storage accounting deployment |
EGI |
131435 |
|
none |
UNIGE-DPNC |
less urgent |
NGI_CH involved |
on hold |
2018-01-24 |
Storage accounting deployment |
EGI |
131948 |
|
none |
CSCS-LCG2 |
less urgent |
NGI_CH assigned |
assigned |
2018-01-22 |
IPv6 deployment at WLCG Tier-2 sites |
EGI |
131965 |
|
none |
UNIBE-LHEP |
less urgent |
NGI_CH |
on hold |
2017-12-14 |
IPv6 deployment at WLCG Tier-2 sites |
EGI |
132927 |
|
cms |
CSCS-LCG2 |
urgent |
NGI_CH involved |
in progress |
2018-02-16 |
Problem with APEL Accounting for all of ... |
EGI |
133480 |
|
none |
CSCS-LCG2 |
urgent |
NGI_CH |
in progress |
2018-02-19 |
setup of /store/test/rucio |
WLCG |
133689 |
|
atlas |
UNIGE-DPNC |
urgent |
NGI_CH |
in progress |
2018-03-01 |
DE IEPSAS-KOSICE DATADISK transfer ... |
WLCG |
133695 |
|
lhcb |
CSCS-LCG2 |
urgent |
NGI_CH |
in progress |
2018-02-26 |
Data access problem at CSCS-LCG2 |
WLCG |
Other topics
* Topic2
* Topic2
Next meeting date:
Next meeting date:
A.O.B.
Attendants
- CSCS:
- CMS:
- ATLAS:
- LHCb:
- EGI:
Action items
* Item1 * Item1 * Item1 * Item1