<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

Action Items

Action items (9)

Legend: <number.> title (added: date, done:date) NEW / DONE

1. Reduce ATLAS dips within the box: (added: 16.01.2020, done:)NEW

(Box = CHIPP allocated nodes at CSCS)

  • IMPLEMENT THE “INTERNAL DYNAMIC ALLOCATION” (easy to implement - but the idle cost goes back to the VOs)

    • Fair share + optimized priority with reservations

    • When a VO comes back will take a higher priority until it gets back to its target, then go back to normal

    • Align the boundaries at the node level (see [1] below in item 3.)

    • IMPLEMENT ON MON20 TEST UNTIL 29 JAN (MAINTENANCE).

    • PRELIMINARY NUMBERS to seize the shared resources:

      • CMS 50%

      • ATLAS 50%

      • LHCb 50%

2. Reduce ATLAS dips outside the box: (added: 16.01.2020, done:)NEW

- Discussion to be started with M. DeLorenzi and CSCS CTO*

  • START THE DISCUSSION TO GO FOR THE “DYNAMIC ALLOCATION”

- forced draining of nodes already in use “capped”

  • START THE DISCUSSION TO GO FOR THE “opportunistic”

- use only idle nodes

- issue: there are very few idel nodes

- (jobs has to be already in the queue - it cannot be detected)

  • START THE DISCUSSION TO GO FOR THE “opportunistic”

- use only idle nodes

- with short jobs “backfilling”

- (jobs has to be already in the queue - it cannot be detected)

3. Help reducing Cache occupancy (added: 16.01.2020, done:)NEW

At the moment we run all VOs in one node, i.e. 3 stacks of software in one node

- go for user segregation: a portion of it (the one not in share resource band) can be done on the reservation test by drawing the boundaries on the node [1] (see item 1)

4. Site Log (added: 16.01.2020, done:20.02.2020)DONE

IMPLEMENT A SAFE SITE LOG FOR CHIPP RESOURCES. Both CSCS and Experiment to compile it

Done: Collect in a twiki Hot Topics the list of the most relevant changes from both CSCS and the Experiment side that can affect operations

5. ATLAS --nice (added: 16.01.2020, updated:20.02.2020)UPDATED

TRY TO SET IT TO A LOW VALUE AND TEST → SCHEDULED AFTER THE TEST OF [1]

→ Give Gianfranco access to login on Daint and use sprior: login given (20.02.2020)

6. ARC metrics (added: 16.01.2020, updated:20.02.2020)UPDATED

→ check if possible to plug the monitoring package in elastic

On Nick's todo list

7. Queue status hammerclouds (added: 16.01.2020, updated:20.02.2020)UPDATED

→ input from VOreps (provide the API call) to CSCS and then put on the dashboard . The idea is to see in one page with a couple of boes the status of the systems.

LHCb provided the query

CMS trying to get to the information (Vinzenz/Derek)

ATLAS there is no single query that can provide the status. Gianfranco provided in the past some logic to Miguel. Investigatin how to get to a single script that produces one (or a few) binary or semaphore (R/Y/G) outputs

8. dCache updates (added: 16.01.2020, updated:20.02.2020)UPDATED

→ Ask Dario to present plans for dCache at the next ops meeting.

In the process of configuring a dCache Test Lab:

- Goal is to match the production system and test the upgrade steps

- Will be able to spot problems before applying changes to the production system

- Also allows for optimizing upgrades to complete more efficiently

- Evaluating the change logs of previous versions to spot changes and adapt the configuration

9. ATLAS transition to federated resources (added: 16.01.2020, done:) NEW

ATLAS full transition timescale 18 months. Prepare a plan for the transition, follow up in mothly ops meetings.

Edit | Attach | Watch | Print version | History: r9 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2020-02-21 - MauroDonega
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback