CHIPP + CSCS Face to Face Meeting on 2016-09-01

  • Date and time: Thursday 1st of September at 10:00
  • Place: CERN (40-R-B10)
  • External link / EVO: No

Agenda

Attendants

  • Christoph, Derek, Gianfranco, Fabio, Luis, Dino, Dario, Gianni, Miguel, Stefano, Pablo

Minutes and action items

  • Attached documents show the information presented during the meeting. Besides, it was agreed that:
  • ATLAS currently sees a big duplication of efforts (GF has to explain the issue, CSCS has to understand it). If those were the same person that would be more efficient.
  • (action on Gianfranco) ATLAS should communicate the average efficiency of all ATLAS sites during the whole period, so as to understand where some of those 'dips' come from
  • (action on CSCS) CSCS should try to explain where some of those efficiency dips come from during the last year, so as to understand if there is a good explanation (such as known incidents) or not.
  • Gianfranco's dedication to ATLAS exclusively amounts to an average of 0.4 FTE during the last 4 years
  • In the "options to move forward" page, for case A (continue as we are, with improvements) manpower at the site needs a good dashboard and good connection with the VO to understand the logs, know how to compare the site with others, and better ability to move around VO internals.
  • (action on ALL) it was agreed that both the VO Reps and CSCS would sit down together in a 2-day monitoring hackathon to improve on the discovery and response of issues by developing a dashboard with CSCS and all VO's view of important metrics. The availability window is not opened until the second half of October.
  • (action on ALL) we need to add the main sysadmins @ CSCS to all VOs
  • (action on Fabio and CSCS) it would be interesting for CSCS to read the minutes from the CMS weekly operations call (Fabio will attend and send the minutes)
  • (action on Fabio) CSCS mailing list grid-list@cscs.ch needs to receive the CMS Nagios warnings. DONE
  • Fabio is doing the first level support for CMS, but having CSCS look into it should improve not only CMS but the rest of the VOs
  • CMS does not see the efficiency problems that ATLAS reports.
  • (action on Fabio and CSCS) we need to match the CMS and ATLAS efficiencies to see where they match (probably a site-wide problem) and where they differ (probably a VO-specific problem)
  • (action on ALL) Official Availability/Reliability metrics from WLCG is not real. For next time, we need to agree on a A/R figure that represents reality.
  • (action on CSCS) per-VO CPU accounting data should be added to the plot
  • VO-Reps can now login to servers/nodes from login.lcg.cscs.ch (from ela.cscs.ch) as their own user to do certain operations with sudo (such as cat /var/log/* and such). If they see a need for more operations they should open a ticket
  • (action on Derek) we should expect an answer for the VO-Box proposal before end of September
  • (action on CSCS) it was agreed that we should have a bi-weekly call to review new and old tickets. All issues have to be on the table and shared with everyone (ticket or not).
  • (action on Gianfranco and CSCS) we need to look into the ARC configuration (and the VO queue config) to see if it is correct, since there is a more-than-10x increase in the usage of the scratch file system in phoenix since April'16
  • (action on CSCS) we need to understand the impact of not imposing memory limits (e.g. do nodes swap?)
  • (action on CSCS) authentication on Kibana (for graphs/logs sharing) should be done with Grid certificates

Other

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpptx 20160901_F2F_Stefano.pptx r1 manage 2303.8 K 2016-09-01 - 10:26 StefanoGorini  
PDFpdf CSCS-ATLASreport-20160901.pdf r1 manage 1879.4 K 2016-09-01 - 07:42 GianfrancoSciacca CSCS-ATLASreport
PDFpdf Grid_FTF_2016_Sept_1.pdf r1 manage 3887.4 K 2016-09-01 - 07:56 LuisMarch UNIGE Tier-3 ATLAS Cluster
PDFpdf UNIBE-LHEP-20160901.pdf r1 manage 1859.9 K 2016-09-01 - 10:14 GianfrancoSciacca UNIBE-LHEP T2 site report
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r17 - 2016-09-06 - FabioMartinelli
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback