(Box = CHIPP allocated nodes at CSCS)
IMPLEMENT THE “INTERNAL DYNAMIC ALLOCATION” (easy to implement - but the idle cost goes back to the VOs)
Fair share + optimized priority with reservations
When a VO comes back will take a higher priority until it gets back to its target, then go back to normal
Align the boundaries at the node level (see [1] below in item 3.)
IMPLEMENT ON MON20 TEST UNTIL 29 JAN (MAINTENANCE).
PRELIMINARY NUMBERS to seize the shared resources:
CMS 50%
ATLAS 50%
LHCb 50%
START THE DISCUSSION TO GO FOR THE “DYNAMIC ALLOCATION”
- forced draining of nodes already in use “capped”
START THE DISCUSSION TO GO FOR THE “opportunistic”
- use only idle nodes
- issue: there are very few idel nodes
- (jobs has to be already in the queue - it cannot be detected)
START THE DISCUSSION TO GO FOR THE “opportunistic”
- use only idle nodes
- with short jobs “backfilling”
- (jobs has to be already in the queue - it cannot be detected)
IMPLEMENT A SAFE SITE LOG FOR CHIPP RESOURCES. Both CSCS and Experiment to compile it
Done: Collect in a twiki Hot Topics the list of the most relevant changes from both CSCS and the Experiment side that can affect operationsTRY TO SET IT TO A LOW VALUE AND TEST → SCHEDULED AFTER THE TEST OF [1]
→ Give Gianfranco access to login on Daint and use sprior: login given (20.02.2020)
→ input from VOreps (provide the API call) to CSCS and then put on the dashboard . The idea is to see in one page with a couple of boes the status of the systems.
LHCb provided the query
CMS trying to get to the information (Vinzenz/Derek)
ATLAS there is no single query that can provide the status. Gianfranco provided in the past some logic to Miguel. Investigatin how to get to a single script that produces one (or a few) binary or semaphore (R/Y/G) outputs
ATLAS full transition timescale 18 months. Prepare a plan for the transition, follow up in mothly ops meetings.
Warning: Can't find topic "".""
|
|
|