Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure:<br />* Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup<br />* Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup<br />#uncomment this if you want the page only be viewable by the internal people<br />#* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup<br />--> ---+ Swiss Grid Operations Meeting on 2019-03-07 at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598) * *External link*: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE * *Phone gate*: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) * *Switch Vidyo SIP IP*: 137.138.248.204 %TOC% ---++ Site status ---+++ CSCS Systems * <p>Phoenix: all nodes idle, load moved to Daint</p> * <p>ARC CEs reinstalled with new HW, only arc04 missing. Scheduled reinstallation of arc04 on Monday</p> * <p>Testing new squid and scheduled reinstallation of cvmfs and cvmfs1</p> * <p>Old compute nodes distribution?</p> * lhcb want 2x Chassis wiht 4x nodes each * Atlas will take around 100x nodes Storage<br />dCache * normal operation * preparing lab for new design + upgrade * complete storage migration will start in April GPFS * normal operation * Upgraded firmware on DELL SC9000 (SSD tier1) * next week: rolling / online network interface replacement (from IB to Ethernet) * upgrade to GPFS 5.0.2 * Slow tier migration in April from DDN SFA12k to DELL SC9000 ---+++ PSI * Xxx * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][Accounting numbers (from scheduler) from last month]] ---+++ UNIBE-LHEP * <p>Ramping down LHEP in view of the cluster re-deployment</p> * Monthly summary: Pledged: 18k, delivered 18k * Ubelix contributing >50% (23% typical) * Running an average >1850 slots (2500 typical)<br /><br /><img alt="" height="257" src="http://dashb-atlas-job.cern.ch/dashboard/request.py/consumptions_individual?sites=UNIBE-LHEP&sitesCat=All Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2019-02-01&end=2019-02-28&timeRange=daily&granularity=8 Hours&generic=0&sortBy=16&series=All&type=ewa" width="342" /><br /><br /> * <b>6-month history UniBE (pledge: 18 kHS06)<br /><br /></b> <img alt="" height="333" src="http://dashb-atlas-job.cern.ch/dashboard/request.py/resourceutilization_individual?sites=UNIBE-LHEP&sitesCat=All Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2018-09-01&end=2019-02-28&timeRange=daily&granularity=Monthly&generic=0&sortBy=16&diag1=0&diag2=0&diag3=0&diag4=0&diag5=0&diag6=0&diag7=0&diag8=0&diagT=0&diag8pl=0&series=All&type=wchs" width="444" /><br /><br /> * *Accounting numbers (from scheduler) from last month, LHEP only* * Omitted this month ---+++ <b>Swiss ATLAS statistics<br /><br /></b> * *Hammercloud availability:* * <img alt="" height="231" src="%ATTACHURL%/HammerCloud.png" width="643" /><br /><br /> * ANALY_CSCS-HPC: 95% * CSCS-LCG2-HPC_MCORE: 94.5% * UNIBE-* : 100%<br /><br /> * Running slots * Large number of stuck jobs on ARC skew the statistics for CSCS, creating reporting problems to ATLAS * Very likely due to the reported issues with the Daint scratch file system, affecting WLCG jobs in some way<br /><br /> * This required a laborious manual clean up: * job list provided by ATLAS, culled from the aCT * manual cleanup of the ARC sessiondir carried out by Miguel * <br /><img alt="" height="350" src="http://dashb-atlas-job.cern.ch/dashboard/request.py/jobnumbers_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2019-02-01&end=2019-02-28&timeRange=daily&granularity=8 Hours&generic=0&sortBy=0&series=All&type=rmulticores" width="466" /><br /><br /><br /> * <b>Accounting Numbers from the ATLAS dashboard (February 2019) CSCS+UNIBE</b><br />%EDITTABLE{}% | *Cluster* | *Job Type* | *Produced WC core-hours* | *Good vs Bad WC %* | *CPU eff good jobs %* | | CSCS | Any | 3'088'664; 72% | 0.52 | 0.70 | | UniBe | Any | 1'149'338; 28% | 0.75 | 0.75 | <br /><img alt="" height="305" src="http://dashb-atlas-job.cern.ch/dashboard/request.py/resourceutilization_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2019-02-01&end=2019-02-28&timeRange=daily&granularity=8 Hours&generic=0&sortBy=0&diag1=0&diag2=0&diag3=0&diag4=0&diag5=0&diag6=0&diag7=0&diag8=0&diagT=0&diag8pl=0&series=All&type=wchs" width="407" /> <img alt="" height="272" src="http://dashb-atlas-job.cern.ch/dashboard/request.py/terminatedjobsstatus_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2019-02-01&end=2019-02-28&timeRange=daily&sortBy=0&granularity=8 Hours&generic=0&series=All&type=qbwc" width="388" /><br /><br /><img alt="" height="240" src="http://dashb-atlas-job.cern.ch/dashboard/request.py/consumptions_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2019-02-01&end=2019-02-28&timeRange=daily&granularity=8 Hours&generic=0&sortBy=0&series=All&type=ewa" width="319" /><img alt="" height="236" src="http://dashb-atlas-job.cern.ch/dashboard/request.py/consumptions_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2019-02-01&end=2019-02-28&timeRange=daily&granularity=8 Hours&generic=0&sortBy=0&series=All&type=ewg" width="315" /><br /><br /> * Take home lessons from the last month: * Failed WC very high, we need some more real time alerts * Public dashboard replica offline for a while * ATLAS now relies *fully* on ARC services: * we need ARC metrics and/or logs * some ideas by Dino, but such implementation is not a short term affair * what can we do in the meanwhile? * we need monitoring/(nagios, graylog?) automated checks on ARC services<br /><br /> * Please report general Daint issues that could affect WLCG jobs (SLACK-general or email) so that we can react if needed * ... ---+++ *UNIBE-ID* * End of Feb: Reinstallation of ARC CE on nordugrid.unibe.ch AKA UNIBE LHEP_UBELIX: * Reason: EL6 -> EL7 * smooth transition within ~4h * no issues after reinstallation ---+++ *UNIGE* * Xxx * Accounting numbers (from scheduler) from last month <p> </p> ---+++ NGI_CH * Xxx * NGI-CH Open Tickets review <p> </p> ---++ Other topics * Topic1 * Topic2 Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: * CMS: * ATLAS: * LHCb: * EGI: <p> </p> ---++ Action items * Item1
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
png
HammerCloud.png
r1
manage
453.1 K
2019-03-07 - 09:18
GianfrancoSciacca
ATLAS
HammerCloud
last month
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r5
<
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r5 - 2019-03-07
-
DinoConciatore
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback