Tags:
create new tag
view all tags

CHIPP-CSCS Face to Face Meeting on 2019-09-13

  • Date and time: Friday 13th of September at 10:15
  • Place: Zurich ETHZ (LEE E 126 map)
  • External link / EVO: probably not possible

Agenda

  • 10:15 - Welcome and agenda
  • 10:30 - VO Status report (~last 6 months)
    • LHCb (20' - Roland)
    • ATLAS (20' - Gianfranco)
    • CMS (20' - Vinzenz)
  • 11:30 - Tier-2 status, plans & pledges
    • CSCS (45' - Various people)
    • Long-term resource provisioning overview (15' - Pablo)
    • Discussion (30')
  • 13:15 - Lunch
  • 14:30 - Tier-2 status, plans & pledges
    • UNIBE-LHEP (30' - Gianfranco)
  • 15:00 - Tier-3 status and plans
    • PSI (15' - Nina)
    • UNIBE-ID (15' - Gianfranco)
    • UNIGE (15' - Gianfranco)
  • 15:45 - Coffee break
  • 16:00 - NGI_CH (20' - Gianfranco)
    • Open tickets:
  • 16:30 - End of meeting

Attendants

  • CSCS:
  • CMS:
  • ATLAS:
  • LHCb:

Minutes

Please check ALSO the action items, and the attachments with the individual reports.


# LHCb
- Check 2500 job aparent limit
- Higher failure rate compared to other sites (still low, not important)
- SAM tests are VERY bad, but nobody seems to care either
- Re-calculate HS06 with Singularity. Don't publish results in the middle of the month.

# CMS
- CSCS is low in the performance rank (job success rate)
- CMS Vo Box should be updated.
- CMS should check if Phedex will be needed after they adopt Rucio
- Some charts might be wrong: Vinzenz will recheck them soon

# ATLAS
- Generally CSCS is doing well (below the pledge mark, but very close)
- IF a service is down for more than 12 hours, one should declare a downtime
(you can declare a single service down) and send a message on the chat
- Discussion regarding the 40:40:20 shares - will be addressed in the context of next year's grant application and pledges
- discussion on involvement of the VOs in monthly meetings
- Migration to ARC 6 is needed, still not clear how to proceed
- Dashboard: add "cumulative CPU utilization per VO (pie chart Nick gave Gianfranco on the chat" ; + "hammercloud state (green/red box) per VO"
- Bern cluster re-deployment delayed due to delayed availability (and details about) of the decommissioned Phoenix hardware
- Bern could not plan for infrastructure despite having asked for inventory details since over 12 months
- Bern under pledge (largely as a consequence of the above), aiming at levelling up unitl the end of Q4
- Pablo points out Bern should not plan pledged resources counting on "opportunistic resources"
- Gianfranco points out Phoenix resources are not opportunistic, but part of the "LHConCray" agreement, upon which ATLAS has planned for pledges until 2021. These have been attached to the LHConCray report and later presented at at least TWO face2face meetings without meeting objections.
- "Not available" Phoenix nodes information requested (27k HS06)


# Long-term provisioning:
- Two goals for VO-reps to inquire with the VOs:
a) increase usage of accelerators (GPUs atCSCS), and
b) inquire about reducing storage (remove dCache?)
- CSCS to inquire internally how to deploy FPGAs
- Meet at the end of October/Nov and discuss what we found out from others
- LHCb does not use CSCS as cache, but as real Storage (with 1 replica somewhere else)

Action items

  • Follow up in the monthly meetings on the actions listed in the F2F action items (e.g. on the request for details on the decommissioned HW)

  • Roland - LHCb
    • check the y-axis (jobs submitted). It looks inconsistent
  • Vinzenz - CMS
    • the monitoring moved to grafana. Re-check the plots shown; compare grafana with CMS-dashboard.
  • General:
    • modify/optimize slurm for proper job distributions (to avoid the skewing which happened)
      (remember CMS was low at 33%, ATLAS high at 41%; LHCbup at 26%)
    • re-call: all V0-contacts and CSCS MUST check the dashboards daily for ALL three experiments (and report monthly );
      in case of serious problems (one VO missing, not submitting jobs etc.)-> report immediately to the corresponding VO --> tickets
    • Monitor running vs. installed cores;
    • monitor VO-shares
    • Batch vs. Dashboard metrics unification
    • Understand the differences in quoted numbers for "delivered resources", which are available on 1) GRAFANA, 2) accounting-next.egi.eu/wlcg/report/tier2/ and 3) the VO-specific dashboards;
    • VO-status box: get one per VO ?

  • all VOs: discuss internally possible consequences if storage in general was reduced or even abandoned; strategies with respect to accelerators.

Attachments

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf 20190913_CHIPP-CSCS_resource_provisioning_overview.pdf r1 manage 351.6 K 2019-09-16 - 06:26 PabloFernandez  
PDFpdf 20190913_F2F_CSCS.pdf r1 manage 7281.3 K 2019-09-13 - 08:37 StefanoGorini  
PDFpdf ATLAS-report-20190913.pdf r1 manage 3405.8 K 2019-09-13 - 08:35 GianfrancoSciacca ATLAS VO report (T2)
PDFpdf EGI-20190913.pdf r1 manage 477.8 K 2019-09-13 - 09:03 GianfrancoSciacca EGI-topics
PDFpdf LHCb_CHIPP_Zurich_Sep2019.pdf r1 manage 951.1 K 2019-09-13 - 09:25 RolandBernet  
PDFpdf UNIBE-LHEP-20190913.pdf r1 manage 2667.2 K 2019-09-13 - 08:51 GianfrancoSciacca UNIBe-LHEP T2 report
PDFpdf UNIGE-DPNC-20190913.pdf r1 manage 114.5 K 2019-09-13 - 08:53 GianfrancoSciacca UNIGE-DPNC Tier3 report
PDFpdf f2f-19-zurich-cms-t3-status.pdf r1 manage 800.4 K 2019-09-13 - 08:43 NinaLoktionova CMS T3 status report
PDFpdf f2f_13Sep19_v1.pdf r2 r1 manage 9547.7 K 2019-10-10 - 08:41 VinzenzStampf CMS - Vinzenz
Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r17 - 2020-01-12 - GianfrancoSciacca
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback