Tags:
create new tag
view all tags
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

Swiss Grid Operations Meeting on 2018-03-01 at 14:00

Site status

CSCS

  • Piz Daint
    • Maintenance operations last week:
    • Moved all LHConCRAY compute nodes to the same cabinet (c9-0) along with all relevant service nodes (DVS and DataWarp). This should reduce the dependency/impact on the overall HSN of LHConCRAY workflows.
      • DVS client caching enabled (so far shows no improvements)
      • Increased DVS nodes from 5 to 8. Work in progress to get them operational.
      • DataWarp (swap) currently not working. Work in progress to get this fixed.
    • Hot-topic: singularity.
      • VOs able to run in singularity containers using regular Tier-2 workflows.
      • However, this is rather a hack (ssh to breakout shifter container) and would like to have a better long-term solution.
      • Could VO jobs run on SLE12SP2 right until singularity gets called? CMS? ATLAS?
      • If so, need to tune CE entries and CVMFS caching (no preloaded cache for this, but likely a shared-rw cache across all nodes).
    • Work happening:
      • Deploying new ARC server for Dom, the TDS of Piz Daint.


  • Phoenix
GPFS
- No major issues but the system is mostly overloaded
- Testing "expected" speed in order to understand the performance we have now
- Planning new hardware architecture

dCache
- Updated to the latest 2.16 release (2.16.60)
- CMS spacemon upload process is working well
- Did some fixes to the fetchcrl, workaround for linkgroup update bug
- Introducing Marco Passerini to the infrastructure
- Using the users mailing list instead of the ticket system

PSI

UNIBE-LHEP

  • Stable operation for several months, no issues or immediate worries to report.
  • Running an average of 2400 slots, Ubelix contribution ~20%

  • Accounting numbers (from scheduler) from last month

    VOJob TypeProduced WC core-hours
    ATLAS Any 1184331
    ops Any 20
    t2k.org Any 72
    uboone Any 0


  • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

  • HC availability [1]:
    • CSCS-LCG2: 97% Prod, 97% Analy
    • CSCS-LCG2-HPC: 91% Prod, 91% Analy
    • UNIBE-LHEP: 98% Prod, 95% Analy
    • UNIBE-LHEP-UBELIX: 98% Prod, 97% Analy
  • CSCS running 3000 slots on average, UNIBE running 2400

Cluster Job Type Produced WC core-hours Good vs Bad WC % CPU eff good jobs %
CSCS Any 2373724 (61%) 0.62 0.58
Unibe Any 1507416 (39%) 0.94 0.80




[1] http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562#time=custom&start_date=2017-07-01&end_date=2018-01-23&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2,CSCS-LCG2-HPC,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE

UNIBE-ID

  • Stable delivery for ATLAS
  • Planning to implement the event service workflow to fetch opportunistically spare slots on Ubelix

UNIGE

  • SE merger to UNIBE-LHEP planned
  • Trying to replicate some data to CSCS, but ptoblematic for now
  • Also trying a local rucio download

NGI_CH

* NGI-CH Open Tickets review:

https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO

Ticket-ID Type VO Site Priority Resp. Unit Status Last Update Subject Scope
133695 lhcb CSCS-LCG2 urgent NGI_CH in progress 2018-02-26 Data access problem at CSCS-LCG2 WLCG
133689 atlas UNIGE-DPNC urgent NGI_CH in progress 2018-03-01 DE IEPSAS-KOSICE DATADISK transfer ... WLCG
133480   none CSCS-LCG2 urgent NGI_CH in progress 2018-02-19 setup of /store/test/rucio WLCG
132927   cms CSCS-LCG2 urgent NGI_CH involved in progress 2018-02-16 Problem with APEL Accounting for all of ... EGI
131965   none UNIBE-LHEP less urgent NGI_CH on hold 2017-12-14 IPv6 deployment at WLCG Tier-2 sites EGI
131948   none CSCS-LCG2 less urgent NGI_CH assigned assigned 2018-01-22 IPv6 deployment at WLCG Tier-2 sites EGI
131435   none UNIGE-DPNC less urgent NGI_CH involved on hold 2018-01-24 Storage accounting deployment EGI
131433   none T3_CH_PSI less urgent NGI_CH assigned in progress 2018-01-15 Storage accounting deployment EGI
131353 atlas UNIGE-DPNC urgent NGI_CH involved in progress 2018-02-24 Problem getrting data from UNIGE-DPNC WLCG

Other topics

  • Topic1
  • Topic2

    Next meeting date:

A.O.B.

Attendants

  • CSCS:
  • CMS:
  • ATLAS:
  • LHCb:
  • EGI:

Action items

* Item1 * Item1

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2018-04-05 - GianfrancoSciacca
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback