Swiss Grid Operations Meeting on 2013-02-07
Agenda
Status
- CSCS (reports Pablo):
- Smooth operations in January. Problem this morning with SRM, /pnfs got unmounted at 21:30, no traces of why.
- Deployed 4 x DCS3700 controllers (2 blocks x 279 TB), still 1 block missing. Pledges are already met.
- Work in progress:
- 8 new compute nodes (to meet the pledges)
- 2 virtualization servers
- PF had a meeting with some people from EGI and NG for the ARC Nagios probes. Minutes here
- PSI (reports Fabio):
- Installed 9 SL5.7 UIs with the latest UMD 2 middleware. Used 3 disks with mdadm ( Raid 1 + spare partitions ) + Raid 0.
- Installed SL5.7 WN Tarball + opened a minor GGUS Ticket vs that version. Tests ongoing.
- The latest NoMachine Player on Mac OS X 10.8 executed vs our FreeNX 3.4 Server badly transfers the user provided commands like
/usr/bin/konsole
; found a simple workaround that starts konsole
and avoid to buy the FreeNX 3.5 licenses.
- Were you aware that the recent ubertftp offers the recursive options ? now a Grid user can run:
chgrp [-r] group , -chmod [-r] perms , -dir [-r] , -ls [-r]
and the dangerous -rm [-r]
. This user interaction is close to the nfs
dCache interaction, but on WAN. I tried uberftp
vs our T3 and CSCS but they suffer the same gPlazma
bug, so the ubertftp
commands fails.
- UNIBE (reports Gianfranco):
- LAN problems on the production cluster, causes lustre hang-ups and downtimes (4 times already in January, wes fine until new year)
- ARC CE on production cluster upgrade from nordugrid-arc-compute-element.noarch 1.0.1-1.el5 (nordugrid 1.1.0) to 2.0.1-1.el5 (EMI-2)
- ARC CE to front new cluster scrapped (ARC installation/upgrade teething problems, on SLC5.8), re-installed on SLC6.3 and ARC 2.0.1
- Immediate next step: re-install WN's, MDS, OSS's, add one UI/interactive node (gLite/ARC) on SLC6.3
- UNIGE (reports Szymon):
- Upgraded the head of our DPM from gLite to EMI-2
- The new had node (but not it's MySQL DB) runs in a VM
- Virtualization of services (VirtualBox)
- Site BDII runs in another VM
- New ARC will also run in a VM (when I have time for it)
- More urgent for the Grid jobs is CERNVMFS, still to do here
- Not needed for local users because we have /afs/cern.ch/...
- Hardware/OS problems with IBM x3755 M3 (32 cores, 96 GB RAM) (7 batch workers, 1 login machine)
- The 'validation' unit still unstable
- The other 7 are stable, but lost network a few times. Driver patch applied.
- UZH (reports Sergio):
- Switch (reports Alessandro):
- Nagios update 19 installed (some problems with prod instance, should be solved now)
- Retirement calendar for EMI2 circulated, EMI3 announced.
- Sigve's report from Amsterdam (see his email)
- OMB discussion on service sharing: we already do it in NGI_CH, can we do more?
- Nagios probes reviewing effort working group, CSCS will participate to the preliminary meeting
- WN tar ball testing: PSI actively involved now? Do we only use it for the UI? Fabio: about UIs we use RPMs; about WN the WN tarball; no plans to use the UI Tarball - Gianfranco: we use the UI Tarball
Other topics
- ATLAS LOCALGROUPDISK space token at the CSCS (Szymon)
- little used
- 10 TB is too small to be useful
- 50 TB would be useful
- not urgent, but maybe later in 2013?
- ATLAS DDM moving to "federated storage using xrootd" (FAX).
- We would try reading data at CSCS by jobs running in Geneva
- Topic2
Next meeting date: 7th of March
AOB
Attendants
- CSCS: Miguel, George, Pablo
- CMS: Fabio, Daniel, Derek
- ATLAS: Gianfranco, Szymon
- LHCb:
- EGI: Alessandro
Action items
Uberftp examples
$ uberftp t3se01.psi.ch "ls"
220 GSI FTP door ready
200 PASS command successful
Could not list (null): 451 Local error in processing
$ uberftp storage01.lcg.cscs.ch "ls"
220 GSI FTP door ready
200 PASS command successful
Could not list (null): 451 Local error in processing
$ uberftp grid-se.physik.uni-wuppertal.de "ls"
220 GSI FTP door ready
200 PASS command successful
dr-x------ 1 dteam001 dteam001 512 May 22 2009 admin
dr-x------ 1 dteam001 dteam001 512 May 22 2009 usr
dr-x------ 1 dteam001 dteam001 512 Jan 13 13:13 pnfs
--- Gerd's e-mail ---
I can confirm the problem and the problem also affects newer releases of dCache. It does not depend on the version of uberftp used,
although it may require different commands in different versions of uberftp to trigger the bug.
The problem only affects the Chimera root. Eg if gPlazma is configured to expose a different directory ad the name space root, the problem disappears.
I will write a patch. The patch will be merged into all supported versions of dCache (which at the moment still includes 1.9.12 even though it is getting closer to end-of-life).
--- Gerd's trick to make uberftp work today ---
Correct the rows of /etc/grid-security/storage-authzdb
from:
authorize martinelli_f read-write 2980 500 / / /
to:
authorize martinelli_f read-write 2980 500 / /pnfs /
but I still need to understand the impact of this change.