MeetingSwissGridOperations20160707 < LCGTier2

<!-- keep this as a security measure:
   * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup
   * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup
   #uncomment this if you want the page only be viewable by the internal people
   #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup
-->

---+ Swiss Grid Operations Meeting on 2016-07-07 at 14:00
   * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598)
   * *External link*: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE
   * *Phone gate*: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign)
   * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email)
   * *Switch Vidyo SIP IP*: 137.138.248.204
%TOC%

---++ Site status
---+++ CSCS
   * Xxx
   * Accounting numbers (from scheduler) from last month

---+++ PSI
   * Upgraded my 2 HP CentOS7 NFSv4 NAS to [[http://zfsonlinux.org/][ZoL v0.6.5.7]] 
      * 1st is the primary NAS featuring 24 SAS disks 15k 600GB
      * 2ns is the secondary NAS featuring 12 SATA disks 7.2k 3000GB ( cold backup )
      * both owns a dual 10Gb/s card put in LACP bonding mode
      * on the secondary NAS I'm going to make a ZFS fs for dCache and provide ~5TB to the PSI T3 ; it's a shame to use this HW only for backups ( 5y warranty )
   * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][Accounting numbers (from scheduler) from last month]]

---+++ UNIBE-LHEP
   * Xxx
   * Accounting numbers (from scheduler) from last month

---+++ UNIBE-ID
   * Mostly smooth operation
   * Procurement:
      * 80 new server (76*20 + 4*16 =&gt; 1584 new cores; disontinued 144 cores (oldest nodes)
         * installed and provisioned
   * Migration from OGSGE =&gt; Slurm planned for Q4
   * Probs with NAMD jobs (using ibverbs directly) =&gt; low level IB errors from mlx4 regarding qp
      * no errors with MPI jobs using ompi or the like
      * no errors with storage (GPFS over RDMA)
   * ATLAS specific: large number of random a-rex crashes within the last 2 weeks
      * reason unknown, happened 24x between 2016-06-15 and last monday; no crash since 3 days

---+++ UNIGE
   * Operations 
      * 10 machines added into the batch system (80 cores) + 3 machines replaced:
      * DELL - Intel Xeon @ 2.4 GHz - with 8 cores and 48 GB of memory
      * RAID controller: Common problem for our DPM and NFS File servers (It happened like 3/4 times during last months)
      * Increased activity from DPNC users to run in the batch system (other groups, in addition to ATLAS)
      * Still not in ATLAS production, problems related with memory (hints provided by Gianfranco)
   * Data Management: user datasets from UniGe for ATLASLOCALGROUPDISK at CSCS deleted (space can be moved to ATLASSCRATCHDISK)
   * [[%ATTACHURL%/g07.2016.06.log][Accounting numbers (from scheduler) from last month]]

---+++ NGI_CH
   * Xxx
   * NGI-CH Open Tickets review

---++ Other topics
   * Topic1
   * Topic2
Next meeting date:

---++ A.O.B.

---++ Attendants
   * CSCS:
   * CMS:
   * ATLAS: Michael Rolli (UNIBE-ID) =&gt; absent being ill, nevertheless some text above
   * LHCb:
   * EGI:

---++ Action items
   * Item1
Attachments
Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
log	g07.2016.06.log	r1	manage	1.1 K	2016-07-07 - 11:05	LuisMarch	Accounting UniGe June 2016
This topic: LCGTier2 > WebHome > MeetingsBoard > MeetingSwissGridOperations20160707
Topic revision: r4 - 2016-07-07 - MichaelRolli