Tags:
create new tag
view all tags
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

Swiss Grid Operations Meeting on 2018-12-06 at 14:00

Site status

CSCS

System

  • Started planning services transition to Piz Daint. Question about the new ARC CE version to Gianfranco
  • Issues with some nodes and slurm subnet, reconfiguration needed.
  • Grafana dashboard is ready to be exposed (data sync in progress)
dCache
  • Process to IPv6 / dual-stack completed. Required quite some effort
  • We need to add dual-stack to CMS02 too
  • Updated to 3.2.40
  • Planning next year data migration (due to a complete storage renewal)
  • Planning head nodes (re)installation

Scratch

  • Stable operations
  • Much improved IO between (Daint) DVS nodes and storage nodes since the last DVS upgrade
  • Outperforming SSD cache nominal performance (see attachment "ssd-cache-perf.png)"
  • We should have a further improvement with the new software

PSI

UNIBE-LHEP

  • Stable operation, slightly lower delivery for LHEP (dying nodes). Pledged: 18k, delivered 20.6k
  • Ubelix back on 8th November
  • Running an average >2100 slots (<1900 last month, 2500 typical), Ubelix back to 23% (typical)
  • Accounting numbers (from scheduler) from last month (October), LHEP only

VO Job Type Produced WC core-hours    
ATLAS Any 1102994 (1157991 in Oct)    
ops Any 46 (44 in Oct)    
t2k.org Any 0    
uboone Any 0    


6-month history Unibe (pledge: 18 kHS06)


  • Swiss ATLAS statistics

    • HC availability
    • could not retrieve data
    • Runnins slots
    • CSCS: 3000 (3300 October) ; UniBe: 2150 (1850 October)
    • Accounting Numbers from ATLAs dashboard (November) CSCS+UniBe


Cluster Job Type Produced WC core-hours Good vs Bad WC % CPU eff good jobs %
CSCS Any 2397014; 63% (Oct: 2901550 69%) 0.79 (Oct: 0.71) 0.85 (Oct: 0.89)
UniBe Any 1411100; 37% (Oct: 1266896 31%) 0.84 (Oct: 0.85) 0.83 (Oct: 0.85)

  • ARC6 upgrade heads-up
    • At some point in 2019
    • Mayor arc.conf rewrite
    • At the recent NorduGrid developer retreat, I have produced a preliminary conversion to ARC6 of the arc.conf for arc04@lcg.cscs.ch

UNIBE-ID

Stable operations in November after the stuck a-rex issue in October
During the mid December's maitenance down:Deommissioning of nodes that comprise the el6legacy partition
Setup of a subordinate partition for preemptable jobs of the ATLAS experiment at the same tabme

No sysadmin from UNIBE-ID can join this afternoon

UNIGE

Discussing this week how to revive the ARC CE @UniGe

NGI_CH

NGI-CH Open Tickets review

Ticket-ID Type VO Site Priority Resp. Unit StatusSorted ascending Last Update Subject Scope
131948   none CSCS-LCG2 less urgent NGI_CH assigned in progress 2018-12-03 IPv6 deployment at WLCG Tier-2 sites EGI
131965   none UNIBE-LHEP less urgent NGI_CH assigned on hold 2018-11-15 IPv6 deployment at WLCG Tier-2 sites EGI
138592   cms CSCS-LCG2 urgent NGI_CH waiting for reply 2018-12-06 Transfers failing from T2_CH_CSCS to ... WLCG
138296   cms CSCS-LCG2 urgent NGI_CH waiting for reply 2018-12-05 Transfers failing from T2_CH_CSCS WLCG
133695 lhcb CSCS-LCG2 urgent NGI_CH assigned waiting for reply 2018-11-30 Data access problem at CSCS-LCG2 WLCG

Other topics

Update on experiment share re-balance

  • Discussed within CHIPP
  • Internal meeting next week to finalise decision
  • Current direction (not final):
    • Reduce max WC for all VOs at the same level
    • Pack single core jobs to nodes (as opposed to spread them)
  • Trial period of 1-2 months

    Attachment below: ATLAS pending jobs (last 90 days)

Topic2

Next meeting date:

A.O.B.

Attendants

CSCS:
CMS:
ATLAS:
LHCb:
EGI:

Action items

Item1
ssd-cache-perf.png:
ssd-cache-perf.png

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng job-accounting-historical-data.png r1 manage 49.4 K 2018-12-06 - 12:58 GianfrancoSciacca ATLAS pending jobs (90 days)
PNGpng ssd-cache-perf.png r1 manage 165.0 K 2018-12-06 - 09:28 DarioPetrusic  
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2018-12-06 - GianfrancoSciacca
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback