Swiss Grid Operations Meeting on 2016-07-07 at 14:00

Site status

CSCS

  • Xxx
  • Accounting numbers (from scheduler) from last month

PSI

  • Upgraded my 2 HP CentOS7 NFSv4 NAS to ZoL v0.6.5.7
    • 1st is the primary NAS featuring 24 SAS disks 15k 600GB
    • 2ns is the secondary NAS featuring 12 SATA disks 7.2k 3000GB ( cold backup )
    • both owns a dual 10Gb/s card put in LACP bonding mode
    • on the secondary NAS I'm going to make a ZFS fs for dCache and provide ~5TB to the PSI T3 ; it's a shame to use this HW only for backups ( 5y warranty )
  • Accounting numbers (from scheduler) from last month

UNIBE-LHEP

  • Xxx
  • Accounting numbers (from scheduler) from last month

UNIBE-ID

  • Mostly smooth operation
  • Procurement:
    • 80 new server (76*20 + 4*16 => 1584 new cores; disontinued 144 cores (oldest nodes)
      • installed and provisioned
  • Migration from OGSGE => Slurm planned for Q4
  • Probs with NAMD jobs (using ibverbs directly) => low level IB errors from mlx4 regarding qp
    • no errors with MPI jobs using ompi or the like
    • no errors with storage (GPFS over RDMA)
  • ATLAS specific: large number of random a-rex crashes within the last 2 weeks
    • reason unknown, happened 24x between 2016-06-15 and last monday; no crash since 3 days

UNIGE

  • Operations
    • 10 machines added into the batch system (80 cores) + 3 machines replaced:
    • DELL - Intel Xeon @ 2.4 GHz - with 8 cores and 48 GB of memory
    • RAID controller: Common problem for our DPM and NFS File servers (It happened like 3/4 times during last months)
    • Increased activity from DPNC users to run in the batch system (other groups, in addition to ATLAS)
    • Still not in ATLAS production, problems related with memory (hints provided by Gianfranco)
  • Data Management: user datasets from UniGe for ATLASLOCALGROUPDISK at CSCS deleted (space can be moved to ATLASSCRATCHDISK)
  • Accounting numbers (from scheduler) from last month

NGI_CH

  • Xxx
  • NGI-CH Open Tickets review

Other topics

  • Topic1
  • Topic2
Next meeting date:

A.O.B.

Attendants

  • CSCS:
  • CMS:
  • ATLAS: Michael Rolli (UNIBE-ID) => absent being ill, nevertheless some text above
  • LHCb:
  • EGI:

Action items

  • Item1
Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatlog g07.2016.06.log r1 manage 1.1 K 2016-07-07 - 11:05 LuisMarch Accounting UniGe June 2016
Edit | Attach | Watch | Print version | History: r16 | r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r4 - 2016-07-07 - MichaelRolli
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback