(Box = CHIPP allocated nodes at CSCS)
IMPLEMENT THE “INTERNAL DYNAMIC ALLOCATION” (easy to implement - but the idle cost goes back to the VOs)
Fair share + optimized priority with reservations
When a VO comes back will take a higher priority until it gets back to its target, then go back to normal
Align the boundaries at the node level (see [1] below in item 3.)
IMPLEMENT ON MON20 TEST UNTIL 29 JAN (MAINTENANCE).
PRELIMINARY NUMBERS to seize the shared resources:
CMS 50%
ATLAS 50%
LHCb 50%
START THE DISCUSSION TO GO FOR THE “DYNAMIC ALLOCATION”
- forced draining of nodes already in use “capped”
START THE DISCUSSION TO GO FOR THE “opportunistic”
- use only idle nodes
- with short jobs “backfilling”
- (jobs has to be already in the queue - it cannot be detected)
- "Opportunistic was never really on the agenda from the CHIPP/SNF side".
IMPLEMENT A SAFE SITE LOG FOR CHIPP RESOURCES. Both CSCS and Experiment to compile it
Done: Collect in a twiki Hot Topics the list of the most relevant changes from both CSCS and the Experiment side that can affect operationsTRY TO SET IT TO A LOW VALUE AND TEST → SCHEDULED AFTER THE TEST OF [1]
→ Give Gianfranco access to login on Daint and use sprior: login given (20.02.2020)
→ Waiting Gianfranco for the go ahead
→ input from VOreps (provide the API call) to CSCS and then put on the dashboard . The idea is to see in one page with a couple of boes the status of the systems.
LHCb provided the query
CMS trying to get to the information (Vinzenz/Derek): Get as an example the script from LHCb
ATLAS there is no single query that can provide the status. Gianfranco provided in the past some logic to Miguel. Investigatin how to get to a single script that produces one (or a few) binary or semaphore (R/Y/G) outputs
ATLAS full transition timescale 18 months. Prepare a plan for the transition, follow up in mothly ops meetings.
--> Update from Gianfranco