Tags:
meeting1Add my vote for this tag SwissGridOperationsMeeting1Add my vote for this tag create new tag
view all tags

Swiss Grid Operations Meeting on 2013-02-07

Agenda

Status

  • CSCS (reports Pablo):
    • Smooth operations in January. Problem this morning with SRM, /pnfs got unmounted at 21:30, no traces of why.
    • Deployed 4 x DCS3700 controllers (2 blocks x 279 TB), still 1 block missing. Pledges are already met.
    • Work in progress:
      • 8 new compute nodes (to meet the pledges)
      • 2 virtualization servers
    • PF had a meeting with some people from EGI and NG for the ARC Nagios probes. Minutes here
  • PSI (reports Fabio):
    • Installed 9 SL5.7 UIs with the latest UMD 2 middleware. Used 3 disks with mdadm ( Raid 1 + spare partitions ) + Raid 0.
    • Installed SL5.7 WN Tarball + opened a minor GGUS Ticket vs that version. Tests ongoing.
    • The latest NoMachine Player on Mac OS X 10.8 executed vs our FreeNX 3.4 Server badly transfers the user provided commands like /usr/bin/konsole; found a simple workaround that starts konsole and avoid to buy the FreeNX 3.5 licenses.
    • Were you aware that the recent ubertftp offers the recursive options ? now a Grid user can run: chgrp [-r] group , -chmod [-r] perms , -dir [-r] , -ls [-r] and the dangerous -rm [-r]. This user interaction is close to the nfs dCache interaction, but on WAN. I tried uberftp vs our T3 and CSCS but they suffer the same gPlazma bug, so the ubertftp commands fails.
  • UNIBE (reports Gianfranco):
    • LAN problems on the production cluster, causes lustre hang-ups and downtimes (4 times already in January, wes fine until new year)
    • ARC CE on production cluster upgrade from nordugrid-arc-compute-element.noarch 1.0.1-1.el5 (nordugrid 1.1.0) to 2.0.1-1.el5 (EMI-2)
    • ARC CE to front new cluster scrapped (ARC installation/upgrade teething problems, on SLC5.8), re-installed on SLC6.3 and ARC 2.0.1
    • Immediate next step: re-install WN's, MDS, OSS's, add one UI/interactive node (gLite/ARC) on SLC6.3
  • UNIGE (reports Szymon):
    • Upgraded the head of our DPM from gLite to EMI-2
      • The new had node (but not it's MySQL DB) runs in a VM
    • Virtualization of services (VirtualBox)
      • Site BDII runs in another VM
      • New ARC will also run in a VM (when I have time for it)
    • More urgent for the Grid jobs is CERNVMFS, still to do here
      • Not needed for local users because we have /afs/cern.ch/...
    • Hardware/OS problems with IBM x3755 M3 (32 cores, 96 GB RAM) (7 batch workers, 1 login machine)
      • The 'validation' unit still unstable
      • The other 7 are stable, but lost network a few times. Driver patch applied.
  • UZH (reports Sergio):
    • Xxx
  • Switch (reports Alessandro):
    • Nagios update 19 installed (some problems with prod instance, should be solved now)
    • Retirement calendar for EMI2 circulated, EMI3 announced.
    • Sigve's report from Amsterdam (see his email)
    • OMB discussion on service sharing: we already do it in NGI_CH, can we do more?
    • Nagios probes reviewing effort working group, CSCS will participate to the preliminary meeting
    • WN tar ball testing: PSI actively involved now? Do we only use it for the UI? Fabio: about UIs we use RPMs; about WN the WN tarball; no plans to use the UI Tarball - Gianfranco: we use the UI Tarball
Other topics
  • ATLAS LOCALGROUPDISK space token at the CSCS (Szymon)
    • little used
    • 10 TB is too small to be useful
    • 50 TB would be useful
    • not urgent, but maybe later in 2013?
    • ATLAS DDM moving to "federated storage using xrootd" (FAX).
      • We would try reading data at CSCS by jobs running in Geneva
  • Topic2
Next meeting date: 7th of March

AOB

Attendants

  • CSCS: Miguel, George, Pablo
  • CMS: Fabio, Daniel, Derek
  • ATLAS: Gianfranco, Szymon
  • LHCb:
  • EGI: Alessandro

Action items

  • Item1

Uberftp examples

$ uberftp t3se01.psi.ch "ls"
220 GSI FTP door ready
200 PASS command successful
Could not list (null): 451 Local error in processing

$ uberftp storage01.lcg.cscs.ch "ls"
220 GSI FTP door ready
200 PASS command successful
Could not list (null): 451 Local error in processing

$ uberftp grid-se.physik.uni-wuppertal.de "ls"
220 GSI FTP door ready
200 PASS command successful
dr-x------  1  dteam001  dteam001  512 May 22  2009  admin
dr-x------  1  dteam001  dteam001  512 May 22  2009  usr
dr-x------  1  dteam001  dteam001  512 Jan 13 13:13  pnfs

--- Gerd's e-mail ---
I can confirm the problem and the problem also affects newer releases of dCache. It does not depend on the version of uberftp used, 
although it may require different commands in different versions of uberftp to trigger the bug.
The problem only affects the Chimera root. Eg if gPlazma is configured to expose a different directory ad the name space root, the problem disappears.
I will write a patch. The patch will be merged into all supported versions of dCache (which at the moment still includes 1.9.12 even though it is getting closer to end-of-life).

--- Gerd's trick to make uberftp work today ---
Correct the rows of /etc/grid-security/storage-authzdb 
from:
authorize martinelli_f         read-write   2980  500  / / /
to:
authorize martinelli_f         read-write   2980  500  / /pnfs /
but I still need to understand the impact of this change. 
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r15 - 2015-08-06 - FabioMartinelli
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback