MeetingFairShares20181113 < LCGTier2

Tags: view all tags
<!-- keep this as a security measure:
   * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup
   * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup
   #uncomment this if you want the page only be viewable by the internal people
   #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

---+ Fair-share Meeting on 2018-11-13
   * *Date and time*: 13 November 2018 14:00-15:00. (UTC+01:00) Belgrade, Bratislava, Budapest, Ljubljana, Prague
   * *Place*: CSCS Meeting Room 1st Floor (F1)
   * *External link*: <p><font face="Calibri" size="2">Web Portal Address: [[https://vcmeeting.ethz.ch/][https://vcmeeting.ethz.ch]]<br /></font></p> <p><font face="Calibri" size="2">SCOPIA meeting ID 6708365 <br /></font></p> <p><font face="Calibri" size="2">SCOPIA via phone +41 43 244 89 30 | 6708365#</font></p>

---++ Agenda
   * *Fair-share problem introduction (ATLAS)*
      * Share issue first flagged on Piz Daint during the LHConCray commissioning project (1 year ago)
         * <u>Not enough job pressure from CMS</u>
         * <u>Relative shares between ATLAS and LHCb skewed in favour of LHCb</u>
      * Raised again at the f2f meeting on 21st June in ZH
      * In September realised that the issue has shown up on Phoenix too since ~May 2018
      * Hard to keep track of, since monitoring dashboards cannot be accessed
      * Did some investigations with Dino and discussed further f2f with Pablo & Dino again
*What is the fair-share problem?*
   * 
      * <u>ATLAS MultiCore jobs wait too long in the queue compared to single core jobs</u>
         * ATLAS: ~80% MC, ~20% SC
            * 1 job=1 payload
            * internal fair-share done at the factory level, passed to the sites in the form of an ARC job option =&gt; lowers priority
            * walltime request passed to the sites in the form of an ARC job option (tuned to the payload to be executed)
         * CMS: 100% MC
            * 8-core (configurable) pilots sent to the sites
            * internal fair-share done at the factory level, 8-core pilots pull multiple MC and SC payloads
            * walltime request configured at the factory level (arbitrary number)
         * LHCb: 100% SC
            * 1-core pilots sent to the sites
            * internal fair-share done at the factory level, 1-core pilots pull SC payloads
            * wall request configured at CSCS. NOTE: this can be done at the factory level (arbitrary number)<br /><br />
*Why does that happen?:*
   * 
      * Common problem to the large shared sites: SC vs MC scheduling: node fragmenting and backfill favour SC
      * SC slots are held due to the long running configured Walltime
      * SLURM is _not_ an HTC schduler, in the conditions shown above it is hard to judge whether it makes the right scheduling decisions according to its target settings
      * Factors that have an impact:
         * SC vs MC imbalance (per user)
         * cputime imbalance (per user)
         * backfill (although this should favour shorter jobs, these are MC jobs)
         * ATLAS job nice-ing (however this is turned off on Daint, but we _need_ it)
         * Number of queued jobs (per user): is this balanced?
*Impact on ATLAS*
   * 
      * relative shares between experiments skewed to ATLAS disadvantage
      * CPU delivery for ATLAS is really bumpy [1] [2]<br />
         * jobs often wait too long and/or are cancelled by the experiment and re-directed somewhere else
         * this harms several workflows, specifically those that have higher (internal) priority
         * if we host data, we should have an adequate amount of resources available at any time for processing (~40% of the total as baseline)
         * we need to turn back on internal fair-share between the workloads<br /><br />
   * *Options / proposals*
      * Sites have in general invested large efforts in the past and cooked their own recipies (but I know no shared site using SLURM)<br /><br />
      * Option 1:<br />
         * Track and fix the fair share. In order for such effort to be optimised, we need access to the relevant debugging dashboards
         * Might be a labor intensive task
         * Needs changes to the current shared model, very likely compromises between job-length and MC vs SC balance
         * Might not satisfy each experiment requirements (e.g., long jobs, or job nice-ing, etc)
         * Suggestion: pack the nodes with single core jobs first, rather than distributing them across the nodes
         * ...
      * Option 1:
         * Split resources according the the fair-share quotas and allow each experiment to submit to the other partitions on a pre-emptable basis. NOTE: <u>pre-emptable means job KILLED, not checkpointed</u>
         * Each experiment has their own quota and we delegate to them to claim any resource not used by another experiment
         * Each experiment can shape their jobs as they wish
         * ...
      * Option 3:
         * ...<br /><br />[1] http://dashb-atlas-job.cern.ch/dashboard/request.py/resourceutilization_individual?sites=CSCS-LCG2&sitesCat=All%20Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2018-01-01&end=2018-10-31&timeRange=daily&granularity=Daily&generic=0&sortBy=20&diag1=0&diag2=0&diag3=0&diag4=0&diag5=0&diag6=0&diag7=0&diag8=0&diagT=0&diag8pl=0&series=All&type=a<br /><br />[2] http://dashb-atlas-job.cern.ch/dashboard/request.py/resourceutilization_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All%20Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2018-01-01&end=2018-10-31&timeRange=daily&granularity=Daily&generic=0&sortBy=0&diag1=0&diag2=0&diag3=0&diag4=0&diag5=0&diag6=0&diag7=0&diag8=0&diagT=0&diag8pl=0&series=All&type=a<br /><br /><br />
   * *CSCS view*
      * 
   * *Experiment views*
      * CMS
      * LHCb<br /><br />
   * *Next step(s)*
      * 
   * *AOB*

---++ Attendants
   * Roland
   * Christoph
   * Ginafranco
   * Thomas
   * Stefano
   * Nicholas
   * Dino
   * Gianni
   * Miguel

---++ Minutes
   * item

---++ Action items
   * item