Tags:
create new tag
view all tags
<!-- keep this as a security measure:
* Set ALLOWTOPICCHANGE = TWikiAdminGroup,Main.LCGAdminGroup
* Set ALLOWTOPICRENAME = TWikiAdminGroup,Main.LCGAdminGroup
#uncomment this if you want the page only be viewable by the internal people
#* Set ALLOWTOPICVIEW = TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

Action Items

Action items (9)

Legend: <number.> title (added: date, done:date) NEW / Processing / UPDATED / DONE

Last update 12.03.2020

1. Reduce ATLAS dips within the box: (added: 16.01.2020, done:)Processing

(Box = CHIPP allocated nodes at CSCS)

  • IMPLEMENT THE “INTERNAL DYNAMIC ALLOCATION” (easy to implement - but the idle cost goes back to the VOs)

    • Fair share + optimized priority with reservations

    • When a VO comes back will take a higher priority until it gets back to its target, then go back to normal

    • Align the boundaries at the node level (see [1] below in item 3.)

    • IMPLEMENT ON MON20 TEST UNTIL 29 JAN (MAINTENANCE).

    • PRELIMINARY NUMBERS to seize the shared resources:

      • CMS 50%

      • ATLAS 50%

      • LHCb 50%

    • The final results will be clear when CMS will manage to send a proper flow of production jobs
2. Reduce ATLAS dips outside the box: (added: 16.01.2020, done:20.02.2020)DONE

- Discussion to be started with M. DeLorenzi and CSCS CTO*

  • START THE DISCUSSION TO GO FOR THE “DYNAMIC ALLOCATION”

- forced draining of nodes already in use “capped”

  • START THE DISCUSSION TO GO FOR THE “opportunistic”

- use only idle nodes

- issue: there are very few idel nodes

- (jobs has to be already in the queue - it cannot be detected)

  • START THE DISCUSSION TO GO FOR THE “opportunistic”

- use only idle nodes

- with short jobs “backfilling”

- (jobs has to be already in the queue - it cannot be detected)

3. Help reducing Cache occupancy (added: 16.01.2020, done:)Processing

At the moment we run all VOs in one node, i.e. 3 stacks of software in one node

- go for user segregation: a portion of it (the one not in share resource band) can be done on the reservation test by drawing the boundaries on the node [1] (see item 1)

4. Site Log (added: 16.01.2020, done:20.02.2020)DONE

IMPLEMENT A SAFE SITE LOG FOR CHIPP RESOURCES. Both CSCS and Experiment to compile it

Done: Collect in a twiki Hot Topics the list of the most relevant changes from both CSCS and the Experiment side that can affect operations

5. ATLAS --nice (added: 16.01.2020, updated:12.03.2020)UPDATED

TRY TO SET IT TO A LOW VALUE AND TEST → SCHEDULED AFTER THE TEST OF [1]

→ Give Gianfranco access to login on Daint and use sprior: login given (20.02.2020)

Waiting Gianfranco for the go ahead

6. ARC metrics (added: 16.01.2020, updated:20.02.2020)Processing

→ check if possible to plug the monitoring package in elastic

On Nick's todo list

7. Queue status hammerclouds (added: 16.01.2020, updated:12.03.2020)UPDATED

→ input from VOreps (provide the API call) to CSCS and then put on the dashboard . The idea is to see in one page with a couple of boes the status of the systems.

LHCb provided the query

CMS trying to get to the information (Vinzenz/Derek): Get as an example the script from LHCb

ATLAS there is no single query that can provide the status. Gianfranco provided in the past some logic to Miguel. Investigatin how to get to a single script that produces one (or a few) binary or semaphore (R/Y/G) outputs

8. dCache updates (added: 16.01.2020, updated:20.02.2020)Processing

→ Ask Dario to present plans for dCache at the next ops meeting.

In the process of configuring a dCache Test Lab:

- Goal is to match the production system and test the upgrade steps

- Will be able to spot problems before applying changes to the production system

- Also allows for optimizing upgrades to complete more efficiently

- Evaluating the change logs of previous versions to spot changes and adapt the configuration

- Upgrade from 3.2 to 5.2 on March 24 or 26 follow on the hot topics page

9. ATLAS transition to federated resources (added: 16.01.2020, done:) Processing

ATLAS full transition timescale 18 months. Prepare a plan for the transition, follow up in mothly ops meetings.

10. ARC upgrade (added: 12.03.2020, done:) NEW

Upgrade from 5.4.4 to 6.5 in April follow on the hot topics page

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2020-03-12 - MauroDonega
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback