<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup --> ---+ Swiss Grid Operations Meeting on 2016-07-07 at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598) * *External link*: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE * *Phone gate*: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) * *Switch Vidyo SIP IP*: 137.138.248.204 %TOC% ---++ Site status ---+++ CSCS * Xxx * Accounting numbers (from scheduler) from last month ---+++ PSI * Upgraded my 2 HP CentOS7 NFSv4 NAS to [[http://zfsonlinux.org/][ZoL v0.6.5.7]] * 1st is the primary NAS featuring 24 SAS disks 15k 600GB * 2ns is the secondary NAS featuring 12 SATA disks 7.2k 3000GB ( cold backup ) * both owns a dual 10Gb/s card put in LACP bonding mode * on the secondary NAS I'm going to make a ZFS fs for dCache and provide ~5TB to the PSI T3 ; it's a shame to use this HW only for backups ( 5y warranty ) * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][Accounting numbers (from scheduler) from last month]] ---+++ UNIBE-LHEP * Xxx * Accounting numbers (from scheduler) from last month ---+++ UNIBE-ID * Mostly smooth operation * Procurement: * 80 new server (76*20 + 4*16 => 1584 new cores; disontinued 144 cores (oldest nodes) * installed and provisioned * Migration from OGSGE => Slurm planned for Q4 * Probs with NAMD jobs (using ibverbs directly) => low level IB errors from mlx4 regarding qp * no errors with MPI jobs using ompi or the like * no errors with storage (GPFS over RDMA) * ATLAS specific: large number of random a-rex crashes within the last 2 weeks * reason unknown, happened 24x between 2016-06-15 and last monday; no crash since 3 days ---+++ UNIGE * Operations * 10 machines added into the batch system (80 cores) + 3 machines replaced: * DELL - Intel Xeon @ 2.4 GHz - with 8 cores and 48 GB of memory * RAID controller: Common problem for our DPM and NFS File servers (It happened like 3/4 times during last months) * Increased activity from DPNC users to run in the batch system (other groups, in addition to ATLAS) * Still not in ATLAS production, problems related with memory (hints provided by Gianfranco) * Data Management: user datasets from UniGe for ATLASLOCALGROUPDISK at CSCS deleted (space can be moved to ATLASSCRATCHDISK) * [[%ATTACHURL%/g07.2016.06.log][Accounting numbers (from scheduler) from last month]] ---+++ NGI_CH * Xxx * NGI-CH Open Tickets review ---++ Other topics * Topic1 * Topic2 Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: * CMS: * ATLAS: Michael Rolli (UNIBE-ID) => absent being ill, nevertheless some text above * LHCb: * EGI: ---++ Action items * Item1
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
log
g07.2016.06.log
r1
manage
1.1 K
2016-07-07 - 11:05
LuisMarch
Accounting
UniGe
June 2016
This topic: LCGTier2
>
WebHome
>
MeetingsBoard
>
MeetingSwissGridOperations20160707
Topic revision: r4 - 2016-07-07 - MichaelRolli
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback