<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

Swiss Grid Operations Meeting on 2018-03-01 at 14:00

Site status

CSCS

  • Piz Daint
    • Maintenance operations last week:
    • Moved all LHConCRAY compute nodes to the same cabinet (c9-0) along with all relevant service nodes (DVS and DataWarp). This should reduce the dependency/impact on the overall HSN of LHConCRAY workflows.
      • DVS client caching enabled (so far shows no improvements)
      • Increased DVS nodes from 5 to 8. Work in progress to get them operational.
      • DataWarp (swap) currently not working. Work in progress to get this fixed.
    • Hot-topic: singularity.
      • VOs able to run in singularity containers using regular Tier-2 workflows.
      • However, this is rather a hack (ssh to breakout shifter container) and would like to have a better long-term solution.
      • Could VO jobs run on SLE12SP2 right until singularity gets called? CMS? ATLAS?
      • If so, need to tune CE entries and CVMFS caching (no preloaded cache for this, but likely a shared-rw cache across all nodes).
    • Work happening:
* Deploying new ARC server for Dom, the TDS of Piz Daint.


* Deploying new ARC server for Dom, the TDS of Piz Daint. * Phoenix GPFS
- No major issues but the system is mostly overloaded
- Testing "expected" speed in order to understand the performance we have now
- Planning new hardware architecture

dCache
- Updated to the latest 2.16 release (2.16.60)
- CMS spacemon upload process is working well
- Did some fixes to the fetchcrl, workaround for linkgroup update bug
- Introducing Marco Passerini to the infrastructure
- Using the users mailing list instead of the ticket system
* Updated Slurm 17.11.3-2 * General software update
    • Installed singularity
  • Storage
    • xxx
    • yyy

Some statistics:

gra

PSI

UNIBE-LHEP

  • Stable operation for several months, no issues or immediate worries to report.
* Running an average of 2400 slots, Ubelix contribution ~20%

* Running an average of 2400 slots, Ubelix contribution ~20% * Accounting numbers (from scheduler) from last month

VOJob TypeProduced WC core-hours
ATLAS Any 1184331
ops Any 20
t2k.org Any 72
uboone Any 0


* Planning to implement the event service workflow to fetch opportunistically spare slots on Ubelix

*

Accounting numbers (from dashboard) from last month for CSCS and UNIBE

* Accounting numbers (from scheduler) from last month

VOJob TypeProduced WC core-hours
ATLAS Any 1184331
ops Any 20
t2k.org Any 72
uboone Any 0

http://dashb-atlas-job.cern.ch/dashboard/request.py/consumptions_individual?sites=UNIBE-LHEP&sitesCat=All%20Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=null&end=null&timeRange=lastMonth&granularity=Hourly&generic=0&sortBy=16&series=All&type=ewa

* HC availability [1]: * * CSCS-LCG2: 97% Prod, 97% Analy
    • CSCS-LCG2-HPC: 91% Prod, 91% Analy
    • UNIBE-LHEP: 98% Prod, 95% Analy
    • UNIBE-LHEP-UBELIX: 98% Prod, 97% Analy
  • CSCS running 3000 slots on average, UNIBE running 2400

Accounting numbers (from dashboard) from last month for CSCS and UNIBE

* | Cluster | Job Type | Produced WC core-hours | Good vs Bad WC % | CPU eff good jobs % | Accounting numbers (from dashboard) from last month for CSCS and UNIBE | CSCS | Any | 2373724 (61%) | 0.62 | 0.58 | | Cluster | Job Type | Produced WC core-hours | Good vs Bad WC % | CPU eff good jobs % | | CSCS | Any | 2373724 (61%) | 0.62 | 0.58 | | CSCS | Any | 2373724 (61%) | 0.84 | 0.58 | | Unibe | Any | 1507416 (39%) | 0.94 | 0.80 | | Unibe | Any | 1507416 (39%) | 0.93 | 0.80 |

| Unibe | Any | 1507416 (39%) | 0.94 | 0.80 |



[1] http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562#time=custom&start_date=2017-07-01&end_date=2018-01-23&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2,CSCS-LCG2-HPC,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE

UNIBE-ID

* Stable delivery for ATLAS * Xxx

* Planning to implement the event service workflow to fetch opportunistically spare slots on Ubelix

UNIGE

  • Xxx
  • Accounting numbers (from scheduler) from last month

NGI_CH

* Nothing of relevance to report

* Xxx * NGI-CH Open Tickets review

* NGI-CH Open Tickets review:

https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO

Ticket-ID Type VO Site Priority Resp. Unit Status Last Update Subject Scope
133695 lhcb CSCS-LCG2 urgent NGI_CH in progress 2018-02-26 Data access problem at CSCS-LCG2 WLCG
133689 atlas UNIGE-DPNC urgent NGI_CH in progress 2018-03-01 DE IEPSAS-KOSICE DATADISK transfer ... WLCG
133480   none CSCS-LCG2 urgent NGI_CH in progress 2018-02-19 setup of /store/test/rucio WLCG
132927   cms CSCS-LCG2 urgent NGI_CH involved in progress 2018-02-16 Problem with APEL Accounting for all of ... EGI
131965   none UNIBE-LHEP less urgent NGI_CH on hold 2017-12-14 IPv6 deployment at WLCG Tier-2 sites EGI
131948   none CSCS-LCG2 less urgent NGI_CH assigned assigned 2018-01-22 IPv6 deployment at WLCG Tier-2 sites EGI
131435   none UNIGE-DPNC less urgent NGI_CH involved on hold 2018-01-24 Storage accounting deployment EGI
131433   none T3_CH_PSI less urgent NGI_CH assigned in progress 2018-01-15 Storage accounting deployment EGI
131353 atlas UNIGE-DPNC urgent NGI_CH involved in progress 2018-02-24 Problem getrting data from UNIGE-DPNC WLCG

Other topics

  • Topic1
* Topic2

* Topic2

Next meeting date:

Next meeting date:

A.O.B.

Attendants

  • CSCS:
  • CMS:
  • ATLAS:
  • LHCb:
  • EGI:

Action items

* Item1 * Item1 * Item1 * Item1

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r6 - 2018-03-01 - DinoConciatore
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback