Tags:
meeting1Add my vote for this tag create new tag
view all tags

CSCS Operations Meeting on 2016-09-27

  • Date and time:
  • Place:
  • External link / EVO:

Agenda

  • First meeting's overview
  • Issue list review
  • Ticket review
  • Maintenances
  • Other/AOB

Attendants

  • Fabio, Roland, Dino, Dario, Gianni, Miguel, Stefano, Pablo, Luis
  • Gianfranco apologizes but sent feedback by email

Minutes

On the task list inside the first (priority) block (before end-of October): Dashboard Hackathon:
  • We should not have the hackathon without a clear plan on what is wanted to be done. In absence of a better plan, Pablo asks everyone to provide a list of TWO lights/metrics that they will like to see (that you consider most important) in the dashboard within the next two weeks. Stefano will coordinate the Hackathon.
Regarding the rest of the task list (to be addressed starting in November):
  • We need to keep an eye (statistics) on memory utilization and problems derived to memory abuse (e.g. swapping) before imposing limits to jobs. Fabio suggests to impose a maximum of 2xRequiredMem but Pablo insists that could cause other problems and we should not try to solve problems that don't exist (Dino reports nodes are not swapping). This might be a problem, though, for LHConCRAY so the issue will be derived to the project instead.
  • The VO-Box discussion (and related tickets) are still waiting for Derek's input. This is getting increasingly important since Fabio is leaving, because the continuity of the CMS vobox needs to be guaranteed. It was agreed to increase the priority of this task by setting a deadline to November instead of December (that will be too tight for Fabio)
  • The Slurm reports might be easily fixed: CSCS will re-assess the work that is involved and see if that can easily/quickly be done.
  • The "Finalize BDII" task is clarified: it involves changing the HA setting from lbcd to keepalive.
  • All the other tasks have no input and experience no change in priority
Regarding open tickets:
  • #22368 Chech CSCS status on CMS dashboard. This is top priority for Fabio, but it depends on the decision from Derek (another reason to have a final word)
  • #24193 Stalled jobs at CSCS. This is an old ticket from Vladimir and can be closed
  • Time ran out and we could not go through all tickets in detail, but VO Reps report there is no burning issue at the moment
Next meeting in two weeks, same day and same time (11th of October at 14:00)

Action items

  • ATLAS and LHCb to confirm if the efficiency problems are still there
  • CSCS to send Gianfranco ARC config and history of changes if available
  • Fabio to send CSCS a graph with efficiency plots with as much as 4 years of history
  • Dino to help Fabio use Kibana via ssh tunnel
  • CSCS to send Gianfranco the two names to include into the VO
  • Derek to send CSCS a reply for the VO-Box proposal. Final implementation finished before end-of November
  • CSCS to reassess the work involved in fixing the slurm reports
  • CSCS to update the task list with proposed changes
  • EVERYONE to provide a list of two lights/metrics that they would like to see in the dashboard before next meeting
Topic revision: r1 - 2016-09-28 - PabloFernandez
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback