Resources sharing:
Fixing the ATLAS dipsWhat does “ATLAS flat” usage mean ? Narrower oscillation of the #nodes used max +/- 20%
Ideas:
Fixed partitions: 40% allocated to ATLAS
Dynamic allocation:
High priority to CHIPP for a node (so high you kill the others)
Technically may be limited by I/O ?
Memory limited ? only some nodes can be used
Proved with the T0 test
Risk to pay for idle usage → Accounting
Experiment will have to tune the load not to continuosly get to the max (e.g. 200 on average with a max on 250 nodes)
Trial an error on a ~month to see how to deal with the load tuning
Jobs during the T0 test were starting immediately. Check the draining mechanism of the nodes (there was no 5 days queue at that time)
If we complete our budget ahead of time, what do we do ? if the cap is small should not be an issue
We can have a mixture of dynamic allocation + fixed one
Overlapping partition with a cap on the number of nodes. If nodes are not used anybody can use the nodes (even outside CHIPP)
“Amazon example”: On demand / Reservation unused nodes are wasted, to cope with it they increase the price
Nodes “Reservation” : need to move from core/hours → node/hours
for all VOs. Having only part of it, is difficult
Accounting going to the VOs ?
Go for node allocation instead of core allocation “user segregation”
Jobs will have to take whole nodes instead of cores
PLAN:
WITHIN THE BOX (Box = CHIPP allocated nodes at CSCS)
IMPLEMENT THE “INTERNAL DYNAMIC ALLOCATION” (easy to implement - but the idle cost goes back to the VOs)
Fair share + optimized priority with reservations
When a VO comes back will take a higher priority until it gets back to its target, then go back to normal
Align the boundaries at the node level (see [1] below)
IMPLEMENT ON MON20 TEST UNTIL 29 JAN (MAINTENANCE).
PRELIMINARY NUMBERS to seize the shared resources:
CMS 50%
ATLAS 50%
LHCb 50%
OUTSIDE THE BOX - Discussion to be started with M. DeLorenzi and CSCS CTO
START THE DISCUSSION TO GO FOR THE “DYNAMIC ALLOCATION”
- forced draining of nodes already in use “capped”
START THE DISCUSSION TO GO FOR THE “opportunistic”
- use only idle nodes
- issue: there are very few idel nodes
- (jobs has to be already in the queue - it cannot be detected)
START THE DISCUSSION TO GO FOR THE “opportunistic”
- use only idle nodes
- with short jobs “backfilling”
- (jobs has to be already in the queue - it cannot be detected)
OVERALL UNDERUSAGE
Equalize pledges to capacity (within the box). Does not work:
CSCS site availability goal 95%
Scheduler inefficiency
When other sites show that full capacity is reachead they are using opportunistic resources
Better situation in the last month
Cvmfs needs a cache: RAM Cache limitation at Pitz Daint - strike a compromise by running cores or keep them idle to take their RAM:
CMVMFS issue:
Crash when filling the cache
Found a workaround.
CSCS smaller cache than Bern T2, this can uncover bugs in e.g. CVMFS
Help reducing cache usage:
At the moment we run all VOs in one node, i.e. 3 stacks of software in one node - go for user segregation: a portion of it (the one not in share resource band) can be done on the reservation test by drawing the boundaries on the node [1]
IMPLEMENT A SAFE SITE LOG FOR CHIPP RESOURCES. Both CSCS and Experiment to compile it
Items miscellanea
- ATLAS own job micro-priority ( - - nice) => top priority now
SLURM nice parameter - ATLAS computing model assumes that resources are available. ATLAS is managing internal priority of their jobs. Question is how to export this to present SLURM.
Pilot Pull mode the system assigns the priority
ATLAS Push mode priority encoded in the job
Nice can be switched back on, but unclear how to monitor it:
Overall if not working it will be reflected in the <40% share
Still it will not be showing the internal ranking of priorities
TRY TO SET IT TO A LOW VALUE AND TEST → SCHEDULED AFTER THE TEST OF [1]
→ Give Gianfranco access to login on Daint and use sprior
- VO relative share (latest ticket closed, metrics not settled)
→ already covered
- ATLAS ~flat delivery (+/- 20% from due core count) => now seldom a nucleus site
→ already covered
ARC metrics (monitoring and alarms) - since the dismissal of the ganglia monitoring which was available to us (few years)
Metric to monitor on ARC how many jobs are in which (internal) state and see whether you get the distribution of states you expect.
Ganglia was replaced by “elastic”
→ check if possible to plug the monitoring package in elastic
ATLAS HammerCloud status (monitoring and alarms)
To check status of ATLAS/CMS queues (online/blacklist) at a glance
→ input from VOreps (provide the API call) to CSCS and then put on the dashboard
General:
Timely dCache maintenance and upgrades to avoid disruptive upgrades. Inform VOs for plans and progress
Keep the upgrade in line with the rest of the community such that if an issue appears everybody is on it at the same time
“Best practice”
Storage accounting implementation (WLCG / EGI )
Ask Dario to present plans for dCache at the next ops meeting
Long delays in replying to operation issues: Is there any way to improve/help the situation ?
- Nick: too many reporting avenues (giraticket slack calls etc…). Use the CSCS ticket system
- if a problem is flagged on slack who’s
submitting the ticket ?
Do not start the discussion on slack but file a ticket
For investigation try slack, might not work depending on the availability “best effort basis”
Target 3 hours to address general incidents
RT-tickets are sometimes closed without asking feedback from the VO-representative. Having feedback on the implemented changes can prevent mis-understandings/delays
Long term issues not fixed with tickets will be added to the action items of the monthly ops meeting agenda
ATLAS is moving to a federated use of resources (CSCS + Bern) in Switzerland Storage will transition first (going in the direction of reducing the pressure on the dCache storage (or reducing the size of dCache)
ATLAS full transition timescale 18 months. Prepare a plan for the transition, follow up in mothly ops meetings.
(Box = CHIPP allocated nodes at CSCS)
IMPLEMENT THE “INTERNAL DYNAMIC ALLOCATION” (easy to implement - but the idle cost goes back to the VOs)
Fair share + optimized priority with reservations
When a VO comes back will take a higher priority until it gets back to its target, then go back to normal
Align the boundaries at the node level (see [1] below in item 3.)
IMPLEMENT ON MON20 TEST UNTIL 29 JAN (MAINTENANCE).
PRELIMINARY NUMBERS to seize the shared resources:
CMS 50%
ATLAS 50%
LHCb 50%
START THE DISCUSSION TO GO FOR THE “DYNAMIC ALLOCATION”
- forced draining of nodes already in use “capped”
START THE DISCUSSION TO GO FOR THE “opportunistic”
- use only idle nodes
- issue: there are very few idel nodes
- (jobs has to be already in the queue - it cannot be detected)
START THE DISCUSSION TO GO FOR THE “opportunistic”
- use only idle nodes
- with short jobs “backfilling”
- (jobs has to be already in the queue - it cannot be detected)
IMPLEMENT A SAFE SITE LOG FOR CHIPP RESOURCES. Both CSCS and Experiment to compile it
TRY TO SET IT TO A LOW VALUE AND TEST → SCHEDULED AFTER THE TEST OF [1]
→ Give Gianfranco access to login on Daint and use sprior
→ input from VOreps (provide the API call) to CSCS and then put on the dashboard
ATLAS full transition timescale 18 months. Prepare a plan for the transition, follow up in mothly ops meetings.
Investigate possibilities for making the ARC logs externally accessible.
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
![]() |
20200116_ETHmeeting.pdf | r1 | manage | 103.9 K | 2020-01-17 - 13:46 | MauroDonega |
![]() |
Warning: Can't find topic "".""
| ![]() |
![]() |
![]() |
|
![]() |
![]() |