Difference: MeetingSwissGridOperations20181108 (1 vs. 8)

Revision 82018-11-16 - GianfrancoSciacca

Line: 1 to 1
 
META TOPICPARENT name="MeetingsBoard"
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
Line: 36 to 36
 
      • UNIBE-LHEP: 99% Prod, 96% Analy
      • UNIBE-LHEP-UBELIX: 100% ($), Prod, 27% Analy

        ($) effectively up ~30% only

    • CSCS running 3300 slots on average, UNIBE running 1850
Changed:
<
<
    • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

>
>
    • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

 
<-- /editTable -->
Line: 58 to 58
 
  • Distrtibuted DPM storage working well

NGI_CH

  • Our deal with EGI for certificates expires in March 2019
Changed:
<
<
    • Science IT support Bern is looking into what the alternative will be

>
>
    • Science IT support Bern is looking into what the alternative will be

  * NGI-CH Open Tickets review
Ticket-ID Type VO Site Priority Resp. Unit Status Last Update Subject Scope
Line: 71 to 71
 

Other topics

  • Follow up to fair-share meeting

  • Two questions, one for the slurm experts, one for the VO reps:
Changed:
<
<
    • is slurm charging the reserved time or the elapsed time to the user fair-share?
>
>
    • is slurm charging the reserved time or the elapsed*cores time to the user fair-share?
      • NICK: no, it is using (endtime-starttime)*cores

 
    • possible mitigation: pack single core jobs on nodes, as opposed to distribute them across all nodes. How does this sound?
Changed:
<
<
      • this should reduce the node fragmentatiopn and give the MC jobs more opportunities to run timely

  • Other possible mitigations to be discussed internally between VOs need input from CSCS:
    • Distribution of job queue waiting time, last 2 Quarters, split by: Daint vs Phoenix, VO and 8-core vs 1-core (we should not count the T0 jobs)
    • Anything else?

>
>
      • this should reduce the node fragmentation and give the MC jobs more opportunities to run timely
        • NICK: cannot comment at the moment, will look at it

  • Other possible mitigations to be discussed internally between VOs need input from CSCS:
    • Distribution of job queue waiting time, last 2 Quarters, split by: Daint vs Phoenix, VO and 8-core vs 1-core (we should exclude from these plots the T0 jobs)
      • NICK: CSCS will investigate providing queue wait time reporting
    • Anything else?
      • NICK: Move forward with Stefano’s recommendation on Tuesday for a face-to-face meeting, preferably before the end of the year

 
  • Can we agree that the Daint and Phoenix shares (30 or 60 day historical view) will be monitored monthly at this meeting?
Changed:
<
<
  • Topic2
    ...

>
>
    • GIANFRANCO: not discussed

  • Topic2
    ...

  Next meeting date:

Revision 72018-11-15 - GianfrancoSciacca

Line: 1 to 1
 
META TOPICPARENT name="MeetingsBoard"
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
Line: 36 to 36
 
      • UNIBE-LHEP: 99% Prod, 96% Analy
      • UNIBE-LHEP-UBELIX: 100% ($), Prod, 27% Analy

        ($) effectively up ~30% only

    • CSCS running 3300 slots on average, UNIBE running 1850
Changed:
<
<
    • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

>
>
    • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

 
<-- /editTable -->
Line: 48 to 48
 




[1] http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562#time=custom&start_date=2018-10-01&end_date=2018-10-31&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE

UNIBE-ID

Changed:
<
<
  • Enabled EGI ARGUS notification e-mails in GOCDB to respond to CE stalling silently
>
>
  • Enabled EGI ARGO notification e-mails in GOCDB to respond to CE stalling silently
 
  • Opportunistic usage on Ubelix to be added as soon as the sl6 legacy partition will be discontinued
    • slurm pre-emptable partition
    • ATLAS can use idle slots
Line: 58 to 58
 
  • Distrtibuted DPM storage working well

NGI_CH

  • Our deal with EGI for certificates expires in March 2019
Changed:
<
<
    • Science IT support Bern is looking into what the alternative will be

>
>
    • Science IT support Bern is looking into what the alternative will be

  * NGI-CH Open Tickets review
Ticket-ID Type VO Site Priority Resp. Unit Status Last Update Subject Scope
Line: 78 to 78
 
    • Distribution of job queue waiting time, last 2 Quarters, split by: Daint vs Phoenix, VO and 8-core vs 1-core (we should not count the T0 jobs)
    • Anything else?

  • Can we agree that the Daint and Phoenix shares (30 or 60 day historical view) will be monitored monthly at this meeting?
Changed:
<
<
  • Topic2
    ...

>
>
  • Topic2
    ...

  Next meeting date:

Revision 62018-11-15 - DinoConciatore

Line: 1 to 1
 
META TOPICPARENT name="MeetingsBoard"
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
Line: 12 to 12
 

Site status

CSCS

Changed:
<
<
  • Xxx
  • Accounting numbers (from scheduler) from last month
>
>
 

PSI

  • Storage: decommissioning of old SGI and NetApp
Line: 37 to 36
 
      • UNIBE-LHEP: 99% Prod, 96% Analy
      • UNIBE-LHEP-UBELIX: 100% ($), Prod, 27% Analy

        ($) effectively up ~30% only

    • CSCS running 3300 slots on average, UNIBE running 1850
Changed:
<
<
    • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

>
>
    • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

 
<-- /editTable -->
Line: 78 to 78
 
    • Distribution of job queue waiting time, last 2 Quarters, split by: Daint vs Phoenix, VO and 8-core vs 1-core (we should not count the T0 jobs)
    • Anything else?

  • Can we agree that the Daint and Phoenix shares (30 or 60 day historical view) will be monitored monthly at this meeting?
Changed:
<
<
  • Topic2
    ...

>
>
  • Topic2
    ...

  Next meeting date:

Revision 52018-11-15 - GianfrancoSciacca

Line: 1 to 1
 
META TOPICPARENT name="MeetingsBoard"
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
Line: 22 to 22
 

UNIBE-LHEP

Changed:
<
<
>
>
 
  • A bit less stable (lack of manpower), lower delivery for a few months, still fulfilling the pledge.
  • Ubelixed dropped out silently on 10th October
  • Running an average <1900 slots (typical 2500), Ubelix contribution 12% (typical 23%)
Line: 37 to 37
 
      • UNIBE-LHEP: 99% Prod, 96% Analy
      • UNIBE-LHEP-UBELIX: 100% ($), Prod, 27% Analy

        ($) effectively up ~30% only

    • CSCS running 3300 slots on average, UNIBE running 1850
Changed:
<
<
    • Accounting numbers (from dashboard) from last month for CSCS and UNIBE
>
>
    • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

 
<-- /editTable -->
Line: 49 to 49
 




[1] http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562#time=custom&start_date=2018-10-01&end_date=2018-10-31&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE

UNIBE-ID

Changed:
<
<
  • Xxx

>
>
  • Enabled EGI ARGUS notification e-mails in GOCDB to respond to CE stalling silently
  • Opportunistic usage on Ubelix to be added as soon as the sl6 legacy partition will be discontinued
    • slurm pre-emptable partition
    • ATLAS can use idle slots
    • ATLAS jobs killed (not checkpointed) when slots needed by other users
 

UNIGE

Changed:
<
<
  • Xxx
  • Accounting numbers (from scheduler) from last month

>
>
  • Re-commissioning of ARC CE delayed
  • Distrtibuted DPM storage working well

 

NGI_CH

Changed:
<
<
  • Xxx
  • NGI-CH Open Tickets review

>
>
  • Our deal with EGI for certificates expires in March 2019
    • Science IT support Bern is looking into what the alternative will be

* NGI-CH Open Tickets review
Ticket-ID Type VO Site Priority Resp. Unit Status Last Update Subject Scope
138314 atlas CSCS-LCG2 less urgent NGI_CH assigned 2018-11-15 DE CSCS-LCG2 : transfer failures with ... WLCG
138296   cms CSCS-LCG2 urgent NGI_CH assigned 2018-11-14 Transfers failing from T2_CH_CSCS WLCG
133695 lhcb CSCS-LCG2 urgent NGI_CH assigned in progress 2018-10-19 Data access problem at CSCS-LCG2 WLCG
132927   cms CSCS-LCG2 urgent NGI_CH assigned involved in progress 2018-11-12 Problem with APEL Accounting for all of ... EGI
131965   none UNIBE-LHEP less urgent NGI_CH assigned on hold 2018-10-04 IPv6 deployment at WLCG Tier-2 sites EGI
131948   none CSCS-LCG2 less urgent NGI_CH assigned in progress 2018-11-13 IPv6 deployment at WLCG Tier-2 sites EGI
 

Other topics

Changed:
<
<
  • Topic1
  • Topic2

>
>
  • Follow up to fair-share meeting

  • Two questions, one for the slurm experts, one for the VO reps:
    • is slurm charging the reserved time or the elapsed time to the user fair-share?
    • possible mitigation: pack single core jobs on nodes, as opposed to distribute them across all nodes. How does this sound?
      • this should reduce the node fragmentatiopn and give the MC jobs more opportunities to run timely

  • Other possible mitigations to be discussed internally between VOs need input from CSCS:
    • Distribution of job queue waiting time, last 2 Quarters, split by: Daint vs Phoenix, VO and 8-core vs 1-core (we should not count the T0 jobs)
    • Anything else?

  • Can we agree that the Daint and Phoenix shares (30 or 60 day historical view) will be monitored monthly at this meeting?
  • Topic2
    ...

  Next meeting date:

Revision 42018-11-08 - GianfrancoSciacca

Line: 1 to 1
 
META TOPICPARENT name="MeetingsBoard"
Changed:
<
<
<-- keep this as a security measure: 
  • Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
  • Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
>
>
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->
 

Swiss Grid Operations Meeting on 2018-11-08 at 14:00

  • Place: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598)
Changed:
<
<
>
>
  • External link: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE
 
  • Phone gate: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign)
Changed:
<
<
>
>
  • IRC chat: irc:gridchat.cscs.ch:994#lcg (ask pw via email)
 
  • Switch Vidyo SIP IP: 137.138.248.204
Changed:
<
<
>
>
 

Site status

CSCS

Line: 27 to 22
 

UNIBE-LHEP

Changed:
<
<
      • A bit less stable (lack of manpower), lower delivery for a few months, still slightly above the pledge.
>
>
  • A bit less stable (lack of manpower), lower delivery for a few months, still fulfilling the pledge.
 
      • Ubelixed dropped out silently on 10th October
Changed:
<
<
      • Running an average <1900 slots (typical 2500), Ubelix contribution 12% (typical 23%)

      • Accounting numbers (from scheduler) from last month (October), LHEP only

        VOJob TypeProduced WC core-hours
        ATLAS Any

        1157991

        ops Any 44
        t2k.org Any

        0

        uboone Any 0


<-- /editTable -->
>
>
  • Running an average <1900 slots (typical 2500), Ubelix contribution 12% (typical 23%)
  • Large t2k.org run in September, 1 cluster reserved for a local user for almost the entire month

  • Accounting numbers (from scheduler) from last month (October), LHEP only

    VOJob TypeProduced WC core-hours
    ATLAS Any

    1157991

    ops Any 44
    t2k.org Any

    0

    uboone Any 0




<-- /editTable -->
  • Five month history Unibe (pledge: 18 kHS06)
  • Swiss ATLAS statistics
 
Changed:
<
<
        • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

        • HC availability [1]:
>
>
    • HC availability [1]:
 
          • CSCS-LCG2: 95% Prod, 97% Analy
          • CSCS-LCG2-HPC: 75% Prod, 76% Analy
          • UNIBE-LHEP: 99% Prod, 96% Analy
Changed:
<
<
          • UNIBE-LHEP-UBELIX: 100% ($), Prod, 27% Analy

            ($) effectively up ~30% only

            * CSCS running 3300 slots on average, UNIBE running 1850

<-- /editTable -->
>
>
      • UNIBE-LHEP-UBELIX: 100% ($), Prod, 27% Analy

        ($) effectively up ~30% only

    • CSCS running 3300 slots on average, UNIBE running 1850
    • Accounting numbers (from dashboard) from last month for CSCS and UNIBE

<-- /editTable -->
 
Cluster Job Type Produced WC core-hours Good vs Bad WC % CPU eff good jobs %
CSCS Any 2901550 (69%) 0.71 0.89
Unibe Any 1266896 (31%) 0.85 0.85
Changed:
<
<






[1] http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562#time=custom&start_date=2018-10-01&end_date=2018-10-31&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE

>
>






[1] http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562#time=custom&start_date=2018-10-01&end_date=2018-10-31&use_downtimes=false&merge_colors=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_CSCS-HPC,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2-HPC_MCORE,CSCS-LCG2_MCORE,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_MCORE

 

UNIBE-ID

  • Xxx

UNIGE

Line: 55 to 58
 
  • NGI-CH Open Tickets review

Other topics

  • Topic1
Changed:
<
<
  • Topic2

>
>
  • Topic2

  Next meeting date:

Revision 32018-11-08 - GianfrancoSciacca

Line: 1 to 1
 
META TOPICPARENT name="MeetingsBoard"

Revision 22018-11-08 - NinaLoktionova

Line: 1 to 1
 
META TOPICPARENT name="MeetingsBoard"

Revision 12018-07-23 - DinoConciatore

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="MeetingsBoard"
<-- keep this as a security measure: 
  • Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
  • Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

Swiss Grid Operations Meeting on 2018-11-08 at 14:00

Site status

CSCS

  • Xxx
  • Accounting numbers (from scheduler) from last month

PSI

UNIBE-LHEP

  • Xxx
  • Accounting numbers (from scheduler) from last month

UNIBE-ID

  • Xxx

UNIGE

  • Xxx
  • Accounting numbers (from scheduler) from last month

NGI_CH

  • Xxx
  • NGI-CH Open Tickets review

Other topics

  • Topic1
  • Topic2
Next meeting date:

A.O.B.

Attendants

  • CSCS:
  • CMS:
  • ATLAS:
  • LHCb:
  • EGI:

Action items

  • Item1
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback