Swiss Grid Operations Meeting on 2013-02-07
Agenda
Status
- CSCS (reports Pablo):
- Site entered Phase G:
- Substitution of all Phase C storage (Thors, and their temporary replacement lent by CSCS) and further extension by six IBM DCS3700 controllers full of 3 TB disks, increasing the available permanent storage from 1.1 to 1.6 Petabytes.
- Installation of 20 new SandyBridge @ 2.6 GHz compute nodes, increasing the amount to available job slots from 1792 to 2432, and the computing capacity from 17500 to 24200 HepSpec06 (1 HepSpec06 = 1 GigaFlop).
- Installation of 2 virtualization hosts, with 1 TB of space on SSD drives and 96 GB RAM each, that will host all production virtual machines that currently reside on Phase C service nodes.
- Installation of an authenticated xRootd door for dCache, and two special service for the CMS XROOTD Federation in addition:
- xrootd+cmsd in cmsvobox, to act as the CSCS redirector, that publishes files to the regional one.
- dCache xrootd door, chrooted to /pnfs/lcg.cscs.ch/cms/trivcat, on storage01 on a special port. To be authenticated in a couple of months, when we upgrade to 2.2
- UMD-2 upgrade status:
- ARC-CE. Upgrade done, on SL5.
- Site-BDII. Upgrade done, on SL6.
- UI. Ongoing, ready soon, but not urgent, for it is only internal.
- APEL. Ongoing. We're setting up a parallel instance and try to reproduce normal behavior
- WNs. Waiting ATLAS validation for SL6. Mixed SL5/SL6 only possible splitting the cluster with a different queue for atlas+sl6
- CreamCE. Installing in cream03 next week, the other two in April with new hardware
- dCache. Installing in April with new hardware
- PSI (reports Fabio):
- Constantly hunting for old /pnfs user files that bring our dCache daily > 90% .
- Preparing dCache 2.2 migration forseen on March 28th, so far the setup is:
- BDII : VM SL6, UMD2, bdii-5.2.12-1, you can observe it by running:
ldapsearch -x -H ldap://t3bdii01.psi.ch:2170 -b o=grid | less
- SE : VM SL6, dcache-2.2.8-1, bdii-5.2.12-1, dCache services:
dcap gridftp gsidcap srm spacemanager transfermanagers httpd billing srm-loginbroker pinmanager dir info poolmanager broadcast loginbroker topo
- DB : VM SL6, dcache-2.2.8-1, Postgresql 9.2.3, dCache services:
gplazma pnfsmanager cleaner acl admin nfsv3
.
- Evaluating Xrootd and/or WebDav as a new local service for users.
- Once the migration will be done I can provide the conf to CSCS, not great science there but it took me a while to write it.
- UNIBE (reports Gianfranco):
- UNIGE (reports Szymon):
- UZH (reports Sergio):
- Switch (reports Alessandro):
- UMD1 deprecated after April 2013: tickets have been opened to sites with UMD1 services
- EMI Nagios tests to replace SAM ones (forwarded to Sigve and CSCS): anyone interested in this?
- Problems with the WN tar ball distro and the Nagios version test: NGI_DE affected, NGI_CH not (we do not use the tar ball distro)
- After end of EMI the UMD repository will be overhauled -> necessary to change the repo details, ongoing
- How about checking the values in https://accounting.egi.eu/repcountry.php? Gianfranco suggested to select a time slice, run the accounting script -> un-normalized numbers -> normalize them and compare with the portal
- ggus/sympa problem: one ggus test ticket was overlooked, but CCing operations@swing-grid.ch creates spurios emails somehow
- UNIBE problem with srm/gridftp->bug: the grid manager stops when the transfer fails (weird). It seems solved now disabling the (deprecated) gridftp test (UNIBE-ID was not affected)
- On March 11th EMI3 will be announced (Monte Bianco)
- EGI asked for feedback on the (needed?) support for debian: anyone?
- OMB discussed the ARGUS use case: not compulsory; do we want it? Notice that for ARC a text file is used instead.
- New EGI data retention policy: retention period of 12 months -> 18 months (proposed date of enforcement July 1st) -> this requires a change on the SGAS server at SWITCH (which serves SWING/SMSCG)
- Sigve circulated the agenda for the INSPIRE meeting in March/Lugano(CSCS)
Other topics
- HammerCloud Jobs and Site Readiness
- % of successful jobs generally low (between 80/90) for CMS HC jobs; comparable sites usually have >99%
- CMS jobs < 80% for the last 2 days (so site in status 'Not Ready' at the moment)
- Is the problem understood (is it really an ARGUS 1.4.1 problem that will be solved by upgrading to UMD2)?
- Does ATLAS see similar problems with HC jobs at CSCS?
- Next Face to Face meeting in two weeks, in two parts:
- CHIPP, from 10.15h to 14.15h
- EGI, from 14.30h to 16.30h
- Topic3
Next meeting date:
AOB
Attendants
- CSCS: George, Pablo
- CMS: Daniel, Fabio
- ATLAS:
- LHCb: Roland
- EGI:
Action items
This topic: LCGTier2
> WebHome >
MeetingsBoard > MeetingSwissGridOperations20130307
Topic revision: r9 - 2013-03-07 - PabloFernandez