<!-- keep this as a security measure:<br /> * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup<br /> * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup<br /> #uncomment this if you want the page only be viewable by the internal people<br /> #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup<br />--> ---+ Action Items ---++ Action items (9) Legend: <number.> title (added: date, done:date) %ICON{new}% / %ICON{processing}% / %ICON{updated}% / %ICON{done}% *Last update 12.03.2020* ---++++++ 1. Reduce ATLAS dips within the box: (added: 16.01.2020, done:)%ICON{processing}% <p dir="ltr">(Box = CHIPP allocated nodes at CSCS)</p> * <p dir="ltr">IMPLEMENT THE “INTERNAL DYNAMIC ALLOCATION” (easy to implement - but the idle cost goes back to the VOs)</p> * <p dir="ltr">Fair share + optimized priority with reservations</p> * <p dir="ltr">When a VO comes back will take a higher priority until it gets back to its target, then go back to normal</p> * <p dir="ltr">Align the boundaries at the node level (see [1] below in item 3.)</p> * <p dir="ltr">IMPLEMENT ON MON20 TEST UNTIL 29 JAN (MAINTENANCE).</p> * <p dir="ltr">PRELIMINARY NUMBERS to seize the shared resources:</p> * <p dir="ltr">CMS 50%</p> * <p dir="ltr">ATLAS 50%</p> * <p dir="ltr">LHCb 50%</p> * _The final results will be clear when CMS will manage to send a proper flow of production jobs_ * * [19.05] All VO down at 1% * Need the final discussion to decide if the Reservations are useful or not ---++++++ 2. Reduce ATLAS dips outside the box: (added: 16.01.2020, done:20.02.2020)%ICON{done}% - Discussion to be started with M. DeLorenzi and CSCS CTO* * <p dir="ltr">START THE DISCUSSION TO GO FOR THE “DYNAMIC ALLOCATION”</p> <p dir="ltr">- forced draining of nodes already in use “capped”</p> * <p dir="ltr">START THE DISCUSSION TO GO FOR THE “opportunistic”</p> <p dir="ltr">- use only idle nodes</p> <p dir="ltr">- with short jobs “backfilling”</p> <p dir="ltr">- (jobs has to be already in the queue - it cannot be detected)</p> <p dir="ltr">- "Opportunistic was never really on the agenda from the CHIPP/SNF side".</p> ---++++++ 3. Help reducing Cache occupancy (added: 16.01.2020, done:)%ICON{processing}% At the moment we run all VOs in one node, i.e. 3 stacks of software in one node - go for user segregation: a portion of it (the one not in share resource band) can be done on the reservation test by drawing the boundaries on the node [1] (see item 1) - check after the discussion on Reservations (see item 2) ---++++++ 4. Site Log (added: 16.01.2020, done:20.02.2020)%ICON{done}% <p dir="ltr">IMPLEMENT A SAFE SITE LOG FOR CHIPP RESOURCES. Both CSCS and Experiment to compile it</p> Done: Collect in a twiki <a href="HotTopics" target="_blank">Hot Topics</a> the list of the most relevant changes from both CSCS and the Experiment side that can affect operations ---++++++ 4.b ARC CE Log Access (added: 19.05.2020) %ICON{processing}% Investigate propagating ARC CE logs for external access ---++++++ 4.c ARC Configuration on Wiki(added: 19.05.2020) %ICON{processing}% Requires Wiki login --> fix access permission * Mauro ask Derek ---++++++ 5. ATLAS --nice (added: 16.01.2020, updated:12.03.2020)%ICON{updated}% <p dir="ltr">TRY TO SET IT TO A LOW VALUE AND TEST → SCHEDULED AFTER THE TEST OF [1]</p> <p dir="ltr">→ Give Gianfranco access to login on Daint and use sprior: login given (20.02.2020)</p> <p dir="ltr">→ _Waiting Gianfranco for the go ahead_ </p> ---++++++ 6. ARC metrics (added: 16.01.2020, updated:20.02.2020)%ICON{processing}% → check if possible to plug the monitoring package in elastic On Nick's todo list: Capture ARC state counts and display in dashboard ---++++++ 7. Queue status hammerclouds (added: 16.01.2020, updated:12.03.2020)%ICON{updated}% <p dir="ltr">→ input from VOreps (provide the API call) to CSCS and then put on the dashboard . The idea is to see in one page with a couple of boes the status of the systems.</p> <p dir="ltr">LHCb provided the query</p> <p dir="ltr">CMS trying to get to the information (Vinzenz/Derek): _Get as an example the script from LHCb_ </p> <p dir="ltr">ATLAS there is no single query that can provide the status. Gianfranco provided in the past some logic to Miguel. Investigatin how to get to a single script that produces one (or a few) binary or semaphore (R/Y/G) outputs</p> ---++++++ 8. dCache updates (added: 16.01.2020, updated:20.02.2020updated:19.05.2020 )%ICON{done}% → Ask Dario to present plans for dCache at the next ops meeting. In the process of configuring a dCache Test Lab: - Goal is to match the production system and test the upgrade steps - Will be able to spot problems before applying changes to the production system - Also allows for optimizing upgrades to complete more efficiently - Evaluating the change logs of previous versions to spot changes and adapt the configuration - Upgrade from 3.2 to 5.2 on March 24 or 26 follow on the hot topics page. ---++++++ 9. ATLAS transition to federated resources (added: 16.01.2020, done:) %ICON{processing}% <p dir="ltr">ATLAS full transition timescale 18 months. Prepare a plan for the transition, follow up in mothly ops meetings.</p> <p dir="ltr"> _--> Update from Gianfranco_ </p> ---++++++ ---++++++ 10. ARC upgrade (added: 12.03.2020, done:19.05.2020) %ICON{done}% Upgrade from 5.4.4 to 6.5 in April follow on the hot topics page ---++++++ 11. ATLAS GPU Tests (added: 19.05.2020)%ICON{processing}% 1769 jobs start in April Project consumed 102 hours in April _--> Update from Gianfranco_
This topic: LCGTier2
>
WebHome
>
MeetingsBoard
>
ActionItems
Topic revision: r6 - 2020-06-05 - MauroDonega
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback