<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

Swiss Grid Operations Meeting on 2019-03-07 at 14:00

Site status

CSCS

  • Xxx
  • Accounting numbers (from scheduler) from last month

PSI

UNIBE-LHEP

  • Ramping down LHEP in view of the cluster re-deployment

  • Monthly summary: Pledged: 18k, delivered 18k
  • Ubelix contributing >50% (23% typical)
  • Running an average >1850 slots (2500 typical)



  • 6-month history UniBE (pledge: 18 kHS06)



  • Accounting numbers (from scheduler) from last month, LHEP only
    • Omitted this month

Swiss ATLAS statistics

  • Hammercloud availability:
http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562&view=Shifter%20view#time=720&start_date=&end_date=&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE

    • ANALY_CSCS-HPC: 90%
    • CSCS-LCG2-HPC_MCORE: 85.5%
    • UNIBE-* : > 95%

  • Running slots
    • Large number of stuck jobs on ARC skew the statistics for CSCS, creating reporting problems to ATLAS
    • Very likely due to the reported issues with the Daint scratch file system, affecting WLCG jobs in some way

    • This required a manual clean up:
      • job list provided by ATLAS, culled from the aCT
      • manual cleanup of the ARC sessiondir carried out by Miguel




  • Accounting Numbers from the ATLAS dashboard (February 2019) CSCS+UNIBE
Cluster Job Type Produced WC core-hours Good vs Bad WC % CPU eff good jobs %
CSCS Any 3'088'664; 72% 0.52 0.70
UniBe Any 1'149'338; 28% 0.75 0.75





  • Take home lessons from the last month:
    • Failed WC very high, we need some more real time alerts
    • Public dashboard replica offline: prevents monitoring on my/our side
    • ATLAS now relies fully on ARC services:
      • we need ARC metrics and/or logs available to ATLAS
      • we need monitoring/(nagios?) automated checks on ARC services
      • or we risk incidents like the one that occurred in February
    • Please report general Daint issues that could affect WLCG jobs (SLACK-general or email) so that we can react if needed
    • ...

UNIBE-ID

  • Xxx
UNIGE
  • Xxx
  • Accounting numbers (from scheduler) from last month

NGI_CH

  • Xxx
  • NGI-CH Open Tickets review

Other topics

  • Topic1
  • Topic2

Next meeting date:

A.O.B.

Attendants

  • CSCS:
  • CMS:
  • ATLAS:
  • LHCb:
  • EGI:

Action items

  • Item1
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2019-03-06 - GianfrancoSciacca
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback