<!-- keep this as a security measure:<br /> * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup<br /> * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup<br /> #uncomment this if you want the page only be viewable by the internal people<br /> #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup<br />--> ---+ Action Items ---++ Action items (9) Legend: <number.> title (added: date, done:date) %ICON{new}% / %ICON{done}% ---++++++ 1. Reduce ATLAS dips within the box: (added: 16.01.2020, done:)%ICON{new}% <p dir="ltr">(Box = CHIPP allocated nodes at CSCS)</p> * <p dir="ltr">IMPLEMENT THE “INTERNAL DYNAMIC ALLOCATION” (easy to implement - but the idle cost goes back to the VOs)</p> * <p dir="ltr">Fair share + optimized priority with reservations</p> * <p dir="ltr">When a VO comes back will take a higher priority until it gets back to its target, then go back to normal</p> * <p dir="ltr">Align the boundaries at the node level (see [1] below in item 3.)</p> * <p dir="ltr">IMPLEMENT ON MON20 TEST UNTIL 29 JAN (MAINTENANCE).</p> * <p dir="ltr">PRELIMINARY NUMBERS to seize the shared resources:</p> * <p dir="ltr">CMS 50%</p> * <p dir="ltr">ATLAS 50%</p> * <p dir="ltr">LHCb 50%</p> ---++++++ 2. Reduce ATLAS dips outside the box: (added: 16.01.2020, done:)%ICON{new}% - Discussion to be started with M. DeLorenzi and CSCS CTO* * <p dir="ltr">START THE DISCUSSION TO GO FOR THE “DYNAMIC ALLOCATION”</p> <p dir="ltr">- forced draining of nodes already in use “capped”</p> * <p dir="ltr">START THE DISCUSSION TO GO FOR THE “opportunistic”</p> <p dir="ltr">- use only idle nodes</p> <p dir="ltr">- issue: there are very few idel nodes</p> <p dir="ltr">- (jobs has to be already in the queue - it cannot be detected)</p> * <p dir="ltr">START THE DISCUSSION TO GO FOR THE “opportunistic”</p> <p dir="ltr">- use only idle nodes</p> <p dir="ltr">- with short jobs “backfilling”</p> <p dir="ltr">- (jobs has to be already in the queue - it cannot be detected)</p> ---++++++ 3. Help reducing Cache occupancy (added: 16.01.2020, done:)%ICON{new}% At the moment we run all VOs in one node, i.e. 3 stacks of software in one node - go for user segregation: a portion of it (the one not in share resource band) can be done on the reservation test by drawing the boundaries on the node [1] (see item 1) ---++++++ 4. Site Log (added: 16.01.2020, done:20.02.2020)%ICON{done}% <p dir="ltr">IMPLEMENT A SAFE SITE LOG FOR CHIPP RESOURCES. Both CSCS and Experiment to compile it</p> Done: Collect in a twiki <a href="HotTopics" target="_blank">Hot Topics</a> the list of the most relevant changes from both CSCS and the Experiment side that can affect operations ---++++++ 5. ATLAS --nice (added: 16.01.2020, updated:20.02.2020)%ICON{updated}% <p dir="ltr">TRY TO SET IT TO A LOW VALUE AND TEST → SCHEDULED AFTER THE TEST OF [1]</p> <p dir="ltr">→ Give Gianfranco access to login on Daint and use sprior: login given (20.02.2020)</p> ---++++++ 6. ARC metrics (added: 16.01.2020, updated:20.02.2020)%ICON{updated}% → check if possible to plug the monitoring package in elastic On Nick's todo list ---++++++ 7. Queue status hammerclouds (added: 16.01.2020, updated:20.02.2020)%ICON{updated}% <p dir="ltr">→ input from VOreps (provide the API call) to CSCS and then put on the dashboard . The idea is to see in one page with a couple of boes the status of the systems.</p> <p dir="ltr">LHCb provided the query</p> <p dir="ltr">CMS trying to get to the information (Vinzenz/Derek)</p> <p dir="ltr">ATLAS there is no single query that can provide the status. Gianfranco provided in the past some logic to Miguel. Investigatin how to get to a single script that produces one (or a few) binary or semaphore (R/Y/G) outputs</p> ---++++++ 8. dCache updates (added: 16.01.2020, updated:20.02.2020)%ICON{updated}% → Ask Dario to present plans for dCache at the next ops meeting. In the process of configuring a dCache Test Lab: - Goal is to match the production system and test the upgrade steps - Will be able to spot problems before applying changes to the production system - Also allows for optimizing upgrades to complete more efficiently - Evaluating the change logs of previous versions to spot changes and adapt the configuration ---++++++ 9. ATLAS transition to federated resources (added: 16.01.2020, done:) %ICON{new}% <p dir="ltr">ATLAS full transition timescale 18 months. Prepare a plan for the transition, follow up in mothly ops meetings.</p>
This topic: LCGTier2
>
WebHome
>
MeetingsBoard
>
ActionItems
Topic revision: r2 - 2020-02-21 - MauroDonega
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback