Tags:
meeting1Add my vote for this tag SwissGridOperationsMeeting1Add my vote for this tag create new tag
view all tags

Swiss Grid Operations Meeting on 2013-03-07

Agenda

Status

  • CSCS (reports Pablo):
    • Site entered Phase G:
      • Substitution of all Phase C storage (Thors, and their temporary replacement lent by CSCS) and further extension by six IBM DCS3700 controllers full of 3 TB disks, increasing the available permanent storage from 1.1 to 1.6 Petabytes.
      • Installation of 20 new SandyBridge @ 2.6 GHz compute nodes, increasing the amount to available job slots from 1792 to 2432, and the computing capacity from 17500 to 24200 HepSpec06 (1 HepSpec06 = 1 GigaFlop).
      • Installation of 2 virtualization hosts, with 1 TB of space on SSD drives and 96 GB RAM each, that will host all production virtual machines that currently reside on Phase C service nodes.
    • Installation of an authenticated xRootd door for dCache, and two special service for the CMS XROOTD Federation in addition:
      • xrootd+cmsd in cmsvobox, to act as the CSCS redirector, that publishes files to the regional one.
      • dCache xrootd door, chrooted to /pnfs/lcg.cscs.ch/cms/trivcat, on storage01 on a special port. To be authenticated in a couple of months, when we upgrade to 2.2
    • UMD-2 upgrade status:
      • ARC-CE. Upgrade done, on SL5.
      • Site-BDII. Upgrade done, on SL6.
      • UI. Ongoing, ready soon, but not urgent, for it is only internal.
      • APEL. Ongoing. We're setting up a parallel instance and try to reproduce normal behavior
      • WNs. Waiting ATLAS validation for SL6. Mixed SL5/SL6 only possible splitting the cluster with a different queue for atlas+sl6
      • CreamCE. Installing in cream03 next week, the other two in April with new hardware
      • dCache. Installing in April with new hardware
  • PSI (reports Fabio):
    • Constantly hunting for old /pnfs user files that bring our dCache daily > 90% .
    • Preparing dCache 2.2 migration forseen on March 28th, so far the setup is:
      • BDII : VM SL6, UMD2, bdii-5.2.12-1, you can observe it by running: ldapsearch -x -H ldap://t3bdii01.psi.ch:2170 -b o=grid | less
      • SE : VM SL6, dcache-2.2.8-1, bdii-5.2.12-1, dCache services: dcap gridftp gsidcap srm spacemanager transfermanagers httpd billing srm-loginbroker pinmanager dir info poolmanager broadcast loginbroker topo
      • DB : VM SL6, dcache-2.2.8-1, Postgresql 9.2.3, dCache services: gplazma pnfsmanager cleaner acl admin nfsv3.
      • Evaluating Xrootd and/or WebDav as a new local service for users.
      • Once the migration will be done I can provide the conf to CSCS, not great science there but it took me a while to write it.
  • UNIBE (reports Gianfranco):
    • Xxx
  • UNIGE (reports Szymon):
    • Xxx
  • UZH (reports Sergio):
    • Xxx
  • Switch (reports Alessandro):
    • UMD1 deprecated after April 2013: tickets have been opened to sites with UMD1 services
    • EMI Nagios tests to replace SAM ones (forwarded to Sigve and CSCS): anyone interested in this?
    • Problems with the WN tar ball distro and the Nagios version test: NGI_DE affected, NGI_CH not (we do not use the tar ball distro)
    • After end of EMI the UMD repository will be overhauled -> necessary to change the repo details, ongoing
    • How about checking the values in https://accounting.egi.eu/repcountry.php? Gianfranco suggested to select a time slice, run the accounting script -> un-normalized numbers -> normalize them and compare with the portal
    • ggus/sympa problem: one ggus test ticket was overlooked, but CCing operations@swing-grid.ch creates spurios emails somehow
    • UNIBE problem with srm/gridftp->bug: the grid manager stops when the transfer fails (weird). It seems solved now disabling the (deprecated) gridftp test (UNIBE-ID was not affected)
    • On March 11th EMI3 will be announced (Monte Bianco)
    • EGI asked for feedback on the (needed?) support for debian: anyone?
    • OMB discussed the ARGUS use case: not compulsory; do we want it? Notice that for ARC a text file is used instead.
    • New EGI data retention policy: retention period of 12 months -> 18 months (proposed date of enforcement July 1st) -> this requires a change on the SGAS server at SWITCH (which serves SWING/SMSCG)
    • Sigve circulated the agenda for the INSPIRE meeting in March/Lugano(CSCS)
Other topics
  • HammerCloud Jobs and Site Readiness
    • % of successful jobs generally low (between 80/90) for CMS HC jobs; comparable sites usually have >99%
    • CMS jobs < 80% for the last 2 days (so site in status 'Not Ready' at the moment)
    • Is the problem understood (is it really an ARGUS 1.4.1 problem that will be solved by upgrading to UMD2)?
    • Does ATLAS see similar problems with HC jobs at CSCS?
  • Next Face to Face meeting in two weeks, in two parts:
    • CHIPP, from 10.15h to 14.15h
    • EGI, from 14.30h to 16.30h
  • Topic3
Next meeting date:

AOB

Attendants

  • CSCS: George, Pablo
  • CMS: Daniel, Fabio
  • ATLAS:
  • LHCb: Roland
  • EGI:

Action items

  • Item1
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2013-04-04 - PabloFernandez
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback