Tags:
create new tag
view all tags

Swiss Grid Operations Meeting on 2015-12-10

Site status

CSCS

  • Storage
    • dCache: stable but still have to run the cleaner manually. Upgrade to 2.10 will be performed on Wed 13th Jan 2016
    • Atlas: working on the monthly dumps
    • GPFS (scratch): nothing to report
    • New hardware: 4 server for dcache and ~1PB of storage. Working to move GPFS metadata disk on Flash based storage.
  • Compute
    • Added some check function to nodehealtcheck:
      • SWAP cleaner
      • auto solve some blakhole scenarios like auto remount fs
      • after 60 + random number of days the node is putted in dreain for clean and reboot
    • Started some test with new slurm version, to migrate sltop.
    • Today we will order 40 new compute node with E5-2680v4

PSI

  • Xxx

UNIBE-LHEP

  • Operations
    • ce01 cluster re-installation virtually completed (about 900 worker cores running, 120 still to be installed, 256 awaiting delivery)
    • Started with a simple slurm setup (slurm-15.08.1) in order to cut down on commissioning time: one partition with
      SelectType=select/cons_res
      SelectTypeParameters=CR_CPU_Memory
      MemLimitEnforce=no
    • We don't over-subscribe memory anymore: nodes don't starve and crash
    • Memory usage is properly accounted for in 15.08 (PSS): no jobs killed on (artificial) over-limit of "vmem" (now the full address space reserved by a process, no what's allocated or used)
    • Comparing job fail rates between ce01 and ce02 (still on old SGE) has convinced me to rush the re-installation of ce02 (started earlier today)
  • ATLAS specific operations
    • Stable worflows by ATLAS (very large improvement since beginning of run II)
    • Stuck with the implementation of monthly dumps of the namespace on the DPM SE:
      • headnode on SLC5: the dump script does not work and also generating a valid proxy is problematic
      • decided to push the re-deployment of the head node on SLC6
      • legacy config tool (YAIM) no longer supported
      • puppet based configuration, got the right docs at the DPM workshop earlier this week in CERN
      • tests ongoing on a pps VM
      • also complicated by the fact my site-bdii is still co-located with the DPM head node
      • this will likely be the first task for 2016

UNIBE-ID

  • Xxx

UNIGE

  • Operations
    • atlasfs29.unige.ch : New certificate
    • Another File Server has been already installed, but this is for DAMPE experiment (no host certificate needed)
    • We have new hardware to be installed at the cluster: File Servers and a couple of PCs for services
    • We will install puppet for DPM and probably cluster configuration and setup: Let's say we will use a testbed with atlasfs29 + 1 PC of service (1 out of 2, of the previous ones mentioned just above)
  • Network - Outlook
    • We intend for a new network switch of 10 Gb/s, but this is still under negotiation
    • Most likely, it will be in the beggining of next year
  • Storage
    • There wass a DPM SE workshop at CERN on December 7th-8th: https://indico.cern.ch/event/432642/
    • Checking the data stored at the DPM SE for cleaning purposes, since ATLAS requested it
    • Checking data in order to identify files which are registered in the catalogue (rucio), but not physically at the DPM SE and vice versa

NGI_CH

  • Nothing to report

Other topics

  • Proposal to add to this meeting: T2 monthly pledge review (CSCS, UNIBE); GGUS open ticket review
  • Coverage over the holiday season
Next meeting date:

A.O.B.

Attendants

  • CSCS:
  • CMS:
  • ATLAS:Gianfranco,Luis March
  • LHCb:
  • EGI:Gianfranco

Action items

  • Item1
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2015-12-10 - LuisMarch
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback