Tags:
meeting1Add my vote for this tag SwissGridOperationsMeeting1Add my vote for this tag create new tag
view all tags

Swiss Grid Operations Meeting on 2013-10-10

Agenda

Status

  • CSCS (reports Miguel):
    • SLURM migration status: all going according to plan for CREAM-CE (cream02). Need to finish tuning ARC-CE (arc02) so it publishes correct accounting data. APEL seems to be ok as well, but we are not publishing accounting with the new APEL (apel02) until APEL development team gives us green light (GGUS #97623).
    • Need to do further tuning on the information system of CREAM-CE (especially in regards to GLUE2).
    • Plan to open the firewall today for cream02 and tomorrow for arc02. submission to cream04 has been disabled, as we need to move the gridmapdir to the NAS (same location as the other SLURM CREAM-CEs).
    • The queues have changed names, as already mentioned on Digest #1.
    • New storage arrived, installed in the racks, RAIDs tested, but SEs not ready yet. Will be added to dCache once we are done with SLURM.
    • New IB and ETH switches purchased as planned (IB arrived, ETH not yet).
    • According to last EGI Operations meeting we need to upgrade dCache to version 2.6 in order to be compatible with SHA-2. Deadline is end of november. Suggesting to do this on November 06 during CSCS standard maintenance day. TBC.
  • PSI (reports Fabio):
  • UNIBE (reports Gianfranco):
    • ce01.lhep cluster gained stability (SunBlades+thumpers). Only cvmfs partition full issue still from time to time. Need to follow-up and/or move to the NFS shared cache. Still running production only though (and local users), commissioning for analysis still pending.
    • ce.lhep cluster (older hardware) has been shutdown and the ce.lhep service has been decommissioned in GOCDB
      • cluster now expanded by 55 nodes inherited from CERN (we should have ~850 cores on this)
      • completely re-cabled (power+network)
      • could only use 2 Force10 switches for now (problem re-configuring more of them)
      • ce.lhep now rebuilt as ce02.lhep: ROCKS 6.1 with SLC6.4
      • working right now on the images for WN with ATLAS customisation, Lustre MDS and OSS
      • expect to have part (or all?) of it online by end next week
  • UNIGE (reports Szymon):
    • A script to check if "everything" is OK on a batch or login machine
      • all NFS file systems, /cvmfs, AFS, / and /var and /tmp <90% full, /tmp writing, pbs_mom
      • running in an hourly cron
      • if one of the checks fails, or the script blocks, e.g. on df: email and 'offline' the host
      • automatic elimination of "black holes", for the 1st time
    • Our own Nagios, initial setup, pings all machines
      • thanks to Fabio for a few useful hints
    • Work still going on in the machine room
    • New disk servers (4 x IBM x3630 M4, 43 TB each for data) physically mounted, in racks
      • New infrastructure to install OS via network boot
      • Yann is working on how to install OS and get console
    • Free CPU, inherited from ATLAS Trigger (35 x DELL 8 core) will wait more
  • UZH (reports Sergio):
    • Xxx
  • Switch (reports Alessandro):
    • Xxx
Other topics
  • Sysadmin training to be done at the beginning of November. Exact dates to be defined. Candidates are days 4-5 OR (ideally) 11-12. Other suggestions?
  • Fabio: I'd want to connect by SSH and read the German Nagios logs: if you want to request the same please send me: ( your silence => I don't need it ):
    • your desidered account name
    • the IPs from where you are going to connect
    • your SSH Pub key.
    • 16th Oct : so far me + Miguel sent our keys to A.Usai and in turn to KIT, waiting for their feedback.
Next meeting date: Suggesting October 31.

AOB

Attendants

  • CSCS: Miguel Gila, Gianni Ricciardi
  • CMS: Fabio, Daniel
  • ATLAS:
  • LHCb: Roland
  • EGI:

Action items

  • Item1
Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r10 - 2013-10-16 - FabioMartinelli
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback