Tags:
meeting1Add my vote for this tag create new tag
view all tags

CSCS Operations Meeting on 2016-10-18

  • Date and time:
  • Place:
  • External link / EVO:

Agenda

CPU/WT efficiency ATLAS:

CSCS&UNIBE: http://dashb-atlas-job.cern.ch/dashboard/request.py/dailysummary#button=cpuefficiency&sites%5B%5D=CSCS-LCG2&sites%5B%5D=UNIBE-LHEP&sitesCat%5B%5D=CH-CHIPP-CSCS&resourcetype=All&sitesSort=2&sitesCatSort=2&start=null&end=null&timerange=lastMonth&granularity=Hourly&generic=0&sortby=16&series=30&activities%5B%5D=all

CSCS only: http://dashb-atlas-job.cern.ch/dashboard/request.py/dailysummary#button=cpuefficiency&sites%5B%5D=CSCS-LCG2&sitesCat%5B%5D=CH-CHIPP-CSCS&resourcetype=All&sitesSort=2&sitesCatSort=2&start=null&end=null&timerange=lastMonth&granularity=Hourly&generic=0&sortby=16&series=30&activities%5B%5D=all

CSCS average:

Success/Failure efficiency ATLAS (CSCS&UNIBE):

http://dashb-atlas-job.cern.ch/dashboard/request.py/terminatedjobsstatus_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=CH-CHIPP-CSCS&resourcetype=All&activities=all&sitesSort=2&sitesCatSort=2&start=null&end=null&timeRange=lastMonth&sortBy=0&granularity=Daily&generic=0&series=30&type=ebwc

http://dashb-atlas-job.cern.ch/dashboard/request.py/terminatedjobsstatus_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=CH-CHIPP-CSCS&resourcetype=All&activities=all&sitesSort=2&sitesCatSort=2&start=null&end=null&timeRange=lastMonth&sortBy=0&granularity=Daily&generic=0&series=30&type=avgeffwc

CPU&WT usage ATLAS (CSCS&UNIBE):

http://dashb-atlas-job.cern.ch/dashboard/request.py/dailysummary#button=cpuconsumption&sites%5B%5D=CSCS-LCG2&sites%5B%5D=UNIBE-LHEP&sitesCat%5B%5D=CH-CHIPP-CSCS&resourcetype=All&sitesSort=2&sitesCatSort=2&start=null&end=null&timerange=lastMonth&granularity=Daily&generic=0&sortby=0&series=30&activities%5B%5D=all

http://dashb-atlas-job.cern.ch/dashboard/request.py/dailysummary#button=cpuconsumption&sites%5B%5D=CSCS-LCG2&sites%5B%5D=UNIBE-LHEP&sitesCat%5B%5D=CH-CHIPP-CSCS&resourcetype=All&sitesSort=2&sitesCatSort=2&start=null&end=null&timerange=lastMonth&granularity=Daily&generic=0&sortby=16&series=30&activities%5B%5D=all

CSCS WT: 4626261083 s (1745 cores vs 2400 share) Ganglia: 1.6k, Dashboard: 1540

UNIBE WT: 5370215053 s (2071 cores) Dashboard: 2085

  • Other/AOB

Attendants

  • Gianfranco Sciacca (ATLAS)
  • Fabio Martinelli ( CMS )
  • Joosep Pata ( CMS )
  • Stefano, Dario, Gianni, Pablo, Miguel, Birgit

Minutes

  • Efficiency problems are still confirmed by ATLAS (see links below) with average of 0.4 for the ANALY queue. Regular stress tests sent by hand so far by Gianfranco do not show the problem, so one needs to try something different to reproduce it. Gianfranco is going to try to change the requested software release to something more unusual (to stress CVMFS) and, with some more effort, to increase the job type/length to try to stress the Scratch FS more (current jobs may be too short to notice). A ticket with details will be coming from him soon. Variance on the efficiency would be nice to check but it's not clear how.
  • Efficiency graphs for CMS were provided by Fabio some time ago and need to be cross-checked with ATLAS ones by CSCS
  • Fabio sent the two metrics that were asked to create the common dashboard. Still missing input from the rest. Pablo will create a section in the TODO list for this, please add your input there. Stefano has sent a Doodle to organize individual sessions (please book the whole day) so that we can work individually on each VO. Invites will follow.
  • A new CMS VOBOX VM has been installed and needs to be configured by CSCS. Fabio asks the deadline to move 10 days sonner to the 20th of November. An official statement from Derek is still missing regarding this, Stefano will send him a reminder.
  • All the other action items are missing from input from Dino and will be kept open for now.

Regarding tickets:

  • #23092 and #25164, regarding low delivery to ATLAS: CSCS has realized that the VO Shares in SLURM were (not properly) set to 33:33:33. It is not clear what the official target is, and an email will be sent by Pablo to clarify it. For the meantime it was agreed to make it 40:40:20 as in the past.
  • #25071 [gridka-atlas] Accounting: changes were made in ARC and they seem better now. Still need to wait a couple of weeks to make sure it's fixed.

Action items

  • CSCS to look at CMS and ATLAS efficiency plots for the last 4 years and find commonalities and discrepancies
  • CSCS to send instructions to everyone on how to access Kibana via ssh (if certificates are not usable yet)
  • Stefano to send ATLAS two names for adding to the VO
  • Pablo to send an email to all about official compute shares
  • Pablo to open a section in the TODO list for the monitoring dashboard
  • CSCS to send Gianfranco ARC config and history of changes if available
  • Derek to send CSCS a reply for the VO-Box proposal. Final implementation finished before end-of November
  • CSCS to reassess the work involved in fixing the slurm reports
  • EVERYONE but CMS to provide a list of two lights/metrics that they would like to see in the dashboard before next meeting
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2016-10-19 - PabloFernandez
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback