Swiss Grid Operations Meeting on 2013-02-07

Date and time: First Thursday of the month, at 14:00
Place: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 9227296)
External link: http://vidyoportal.cern.ch/flex.html?roomdirect.html&key=Nrq24qRR4V1u
Phone gate: From Switzerland: 0225330322 (portal) + 9227296 (extension) + # (pound sign)
IRC chat: irc:gridchat.cscs.ch:994#lcg (pw: fisica)

Agenda

Status

CSCS (reports Pablo):
- Site entered Phase G:
  - Substitution of all Phase C storage (Thors, and their temporary replacement lent by CSCS) and further extension by six IBM DCS3700 controllers full of 3 TB disks, increasing the available permanent storage from 1.1 to 1.6 Petabytes.
  - Installation of 20 new SandyBridge @ 2.6 GHz compute nodes, increasing the amount to available job slots from 1792 to 2432, and the computing capacity from 17500 to 24200 HepSpec06 (1 HepSpec06 = 1 GigaFlop).
  - Installation of 2 virtualization hosts, with 1 TB of space on SSD drives and 96 GB RAM each, that will host all production virtual machines that currently reside on Phase C service nodes.
- Installation of an authenticated xRootd door for dCache, and two special service for the CMS XROOTD Federation in addition:
  - xrootd+cmsd in cmsvobox, to act as the CSCS redirector, that publishes files to the regional one.
  - dCache xrootd door, chrooted to /pnfs/lcg.cscs.ch/cms/trivcat, on storage01 on a special port. To be authenticated in a couple of months, when we upgrade to 2.2
- UMD-2 upgrade status:
  - ARC-CE. Upgrade done, on SL5.
  - Site-BDII. Upgrade done, on SL6.
  - UI. Ongoing, ready soon, but not urgent, for it is only internal.
  - APEL. Ongoing. We're setting up a parallel instance and try to reproduce normal behavior
  - WNs. Waiting ATLAS validation for SL6. Mixed SL5/SL6 only possible splitting the cluster with a different queue for atlas+sl6
  - CreamCE. Installing in cream03 next week, the other two in April with new hardware
  - dCache. Installing in April with new hardware
PSI (reports Fabio):
- Constantly hunting for old /pnfs user files that bring our dCache daily > 90% .
- Preparing dCache 2.2 migration forseen on March 28th, so far the setup is:
  - BDII : VM SL6, UMD2, bdii-5.2.12-1, you can observe it by running: ldapsearch -x -H ldap://t3bdii01.psi.ch:2170 -b o=grid | less
  - SE : VM SL6, dcache-2.2.8-1, bdii-5.2.12-1, dCache services: dcap gridftp gsidcap srm spacemanager transfermanagers httpd billing srm-loginbroker pinmanager dir info poolmanager broadcast loginbroker topo
  - DB : VM SL6, dcache-2.2.8-1, Postgresql 9.2.3, dCache services: gplazma pnfsmanager cleaner acl admin nfsv3.
  - Evaluating Xrootd and/or WebDav as a new local service for users.
  - Once the migration will be done I can provide the conf to CSCS, not great science there but it took me a while to write it.
UNIBE (reports Gianfranco):
- Xxx
UNIGE (reports Szymon):
- Xxx
UZH (reports Sergio):
- Xxx
Switch (reports Alessandro):
- UMD1 deprecated after April 2013: tickets have been opened to sites with UMD1 services
- EMI Nagios tests to replace SAM ones (forwarded to Sigve and CSCS): anyone interested in this?
- Problems with the WN tar ball distro and the Nagios version test: NGI_DE affected, NGI_CH not (we do not use the tar ball distro)
- After end of EMI the UMD repository will be overhauled -> necessary to change the repo details, ongoing
- How about checking the values in https://accounting.egi.eu/repcountry.php? Gianfranco suggested to select a time slice, run the accounting script -> un-normalized numbers -> normalize them and compare with the portal
- ggus/sympa problem: one ggus test ticket was overlooked, but CCing operations@swing-grid.ch creates spurios emails somehow
- UNIBE problem with srm/gridftp->bug: the grid manager stops when the transfer fails (weird). It seems solved now disabling the (deprecated) gridftp test (UNIBE-ID was not affected)
- On March 11th EMI3 will be announced (Monte Bianco)
- EGI asked for feedback on the (needed?) support for debian: anyone?
- OMB discussed the ARGUS use case: not compulsory; do we want it? Notice that for ARC a text file is used instead.
- New EGI data retention policy: retention period of 12 months -> 18 months (proposed date of enforcement July 1st) -> this requires a change on the SGAS server at SWITCH (which serves SWING/SMSCG)
- Sigve circulated the agenda for the INSPIRE meeting in March/Lugano(CSCS)

Attendants

CSCS: George, Pablo
CMS: Daniel, Fabio
ATLAS:
LHCb: Roland
EGI:

Action items

Item1

This topic: LCGTier2 > WebHome > MeetingsBoard > MeetingSwissGridOperations20130307
Topic revision: r9 - 2013-03-07 - PabloFernandez