<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup --> ---+ Swiss Grid Operations Meeting on 2016-08-04 at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598) * *External link*: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE * *Phone gate*: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) * *Switch Vidyo SIP IP*: 137.138.248.204 %TOC% ---++ Site status ---+++ CSCS * Xxx * Accounting numbers (from scheduler) from last month <ins><span style="white-space: pre;"> </span> </ins> * Worked mainly on the issue about the GPFS slowness and lcb-cp problem * * GPFS Slowness is caused by I/O intensive jobs running simultaneously * LCB-CP deprecated command replaced by gfal-copy, changed on site conf by CMS and Atlas * lhcb is facing the same issue? * <span style="background-color: transparent;">Perfsonar01/02 dead for disc failure, both machines reinstalled with Puppet</span> * <span style="background-color: transparent;">cream[01-03] removed yesterday from BDII and GOCDB, so officially decommissioned. Cream01 and cream03 powerd off today</span> * <span style="background-color: transparent;">Reintalling BDII with puppet</span> Accounting numbers July: <div style="padding-left: 60px;" id="_mcePaste"> | *VO* | *Cpu Hours* | | cms | 1'793'900.165 | | atlas | 1'118'498.575 | | lhcb | 811'097.677 | | ops | 19.319 | | TOTAL | 3'723'519.013 | </div> ---+++ PSI * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][Accounting numbers (from scheduler) from last month]] * *New HW* * 3 Dalco UI * each featuring [ 128GB RAM, 2 * E5-2697v4 CPUs, 6*1.8TB 10k disks, 2*10GbE ] * 1 Storage, type [[http://www.netapp.com/us/products/storage-systems/e2700/e2700-tech-specs.aspx][NetApp E2760]] * [ 52*6TB disks + 8*400GB SSD ], 2 RAID controller SAS based * final net capacity ~200TB * [[https://www.netapp.com/us/media/ds-3395.pdf][NetApp SANtricity SSD Cache]] * [[https://www.netapp.com/us/media/ds-3309.pdf][NetApp SANtricity Dynamic Disk Pools]] * *GGUS Tickets vs CSCS* * Following [[https://xgus.ggus.eu/ngi_ch/?mode=ticket_info&ticket_id=485][Failures at T2_CH_CSCS]] * [[https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Storage/Plugins/GFAL2Impl.py#L55][CMS Job gfal-copy call]] activated because of my recent change [[https://gitlab.cern.ch/SITECONF/T2_CH_CSCS/commit/5675c5963d5bde486abe3d02734e14bb23166308][command value="gfal2"]] * <pre>$ find /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/ /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/ /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/PhEDEx /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/PhEDEx/storage.xml /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/JobConfig /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/JobConfig/cmsset_local.sh /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/JobConfig/cmsset_local.csh /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/JobConfig/site-local-config.xml <---- </pre> * *Holidays* * Previous week I was on leave, next week I'll be on leave too ---+++ UNIBE-LHEP * <p> *Operations* </p> * Nothing specific to report * *ATLAS specific operations* * Nothing specific to report * * *HammerCloud report [1]* <span style="background-color: transparent;"> </span> * UNIBE-LHEP online 74% (was 79% last month). * UNIBE-ID 97% (this doesn't run the high I/O workloads, but it runs analysis) * UNIBE-LHEP_CLOUD* 95% <span style="background-color: transparent;">[1] </span><a target="_top" href="http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics#time=720&start_date=&end_date=&use_downtimes=false&merge_colors=false&sites=multiple&clouds=ND&site=UNIBE-LHEP,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_CLOUD,UNIBE-LHEP_CLOUD_MCORE,UNIBE-LHEP_MCORE?columnid=562&view=Shifter%20view">http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562&view=Shifter%20view#time=720&start_date=&end_date=&use_downtimes=false&merge_colors=false&sites=multiple&clouds=ND&site=UNIBE-LHEP,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_CLOUD,UNIBE-LHEP_CLOUD_MCORE,UNIBE-LHEP_MCORE</a> * *ATLAS resource delivery UNIBE-LHEP vs CSCS-LCG2 [2]* * <span style="background-color: transparent;">All jobs: 47% of ATLAS/CH (</span><a rel="nofollow" href="https://wiki.chipp.ch/twiki/bin/edit/LCGTier2/WallTime?topicparent=LCGTier2.MeetingSwissGridOperations20160707;nowysiwyg=0" title="WallTime (this topic does not yet exist; you can create it)"><span style="background-color: transparent;">WallTime</span></a><span style="background-color: transparent;">), 78% of ATLAS/CH (CPUtime)</span> * <span style="background-color: transparent;">Good jobs: 68% of ATLAS CH (</span><a rel="nofollow" href="https://wiki.chipp.ch/twiki/bin/edit/LCGTier2/WallTime?topicparent=LCGTier2.MeetingSwissGridOperations20160707;nowysiwyg=0" title="WallTime (this topic does not yet exist; you can create it)"><span style="background-color: transparent;">WallTime</span></a><span style="background-color: transparent;">), 84% of ATLAS/CH (CPUtime)</span> [2] <a target="_top" href="http://dashb-atlas-job-prototype.cern.ch/dashboard/request.py/dailysummary#button=cpuconsumption&sites%5B%5D=CSCS-LCG2&sites%5B%5D=UNIBE-LHEP&sitesCat%5B%5D=All+Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2016-07-01&end=2016-07-31&timerange=daily&granularity=Monthly&generic=0&sortby=0&series=All">http://dashb-atlas-job-prototype.cern.ch/dashboard/request.py/dailysummary#button=cpuconsumption&sites%5B%5D=CSCS-LCG2&sites%5B%5D=UNIBE-LHEP&sitesCat%5B%5D=All+Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2016-06-01&end=2016-06-30&timerange=daily&granularity=Monthly&generic=0&sortby=0&series=All</a> <li style="padding-left: 30px;"> *Accounting numbers (from scheduler) for last month (Jul 2016)* (includes ce03/CLOUD)</li> <li style="padding-left: 60px;">WC h: 780748 (ATLAS) - 35044 (t2k.org) - 3289 (uboone) - 12 (ops)</li> ---+++ UNIBE-ID * *Change of Resource Manager:* * ATLAS (ARC-CE) now served by new Slurm server * Transition was easy enough, minor quirks in the first couple of hours due to forgotten change to singlenode environment * Since then stable operation * Rest of the cluster will be moved to Slurm in next maintenance down (2nd Thursday of December) => moew cores again for ATLAS * after OG-SGE dumped * *Operations* * Very stable operations lately ---+++ UNIGE * *Operations* * Back into ATLAS production mode since July 25th: * Memory hacked at PBS batch scheduler for running ATLAS production jobs * Debugging Multi-Core jobs: Not running successfully yet * Running smoothly: Lower user activity due to holidays period * <span style="background-color: transparent;"> *Network* </span> * <span style="background-color: transparent;">Upgrade of network swicth (10 Gb/s) for File Systems soon</span> * *Holidays* * Next 2 weeks * <span style="font-family: Verdana, Arial, Helvetica, sans-serif; color: blue; background-color: transparent; text-decoration: underline;">[[https://wiki.chipp.ch/twiki/pub/LCGTier2/MeetingSwissGridOperations20160804/g07.201607.log][Accounting numbers (from scheduler) from last month]]</span> ---+++ NGI_CH * <span style="background-color: transparent;"><span style="font-family: Helvetica; font-size: 11px; line-height: normal;"><strong>EGI central monitoring instance (ARGO)</strong></span><br /><br /><span style="font-family: Helvetica; font-size: 11px; line-height: normal;">Since July 1st, the EGI infrastructure is being monitored by two monitoring instances that can be found on these addresses:</span><br /><br /> https://argo-mon.egi.eu/nagios <br /> https://argo-mon2.egi.eu/nagios <br /><br /><span style="font-family: Helvetica; font-size: 11px; line-height: normal;">Both instances are running the same set of tests and results provided are equivalent.</span><br /><br /><span style="font-family: Helvetica; font-size: 11px; line-height: normal;">Starting from the same date, the central ARGO Web UI (</span> http://argo.egi.eu/lavoisier <span style="font-family: Helvetica; font-size: 11px; line-height: normal;"> ) provides information from these two instances and the Operations Portal was reconfigured to raise alarms based on information from ARGO central instances.</span></span> <span style="font-family: Helvetica;"><span style="font-size: 11px;"> </span></span> * <span style="background-color: transparent;">NGI-CH Open Tickets review</span> * <span style="font-family: Verdana, Arial, Helvetica, sans-serif; color: #000000;"><span style="white-space: normal;">CSCS</span></span> * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=122679"><span style="background-color: transparent;">122679</span></a><span style="background-color: transparent;"> (CMS) timeout in file copy to SE (switch to gfal-copy broke some Nagios tests?)</span> * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=122486"><span style="background-color: transparent;">122486</span></a><span style="background-color: transparent;"> (ATLAS) expose the full PFN through their xrootd doors => just closed it</span> * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=122155"><span style="background-color: transparent;">122155</span></a><span style="background-color: transparent;"> (ATLAS) file transfers failing (inconsistent file size & checksum): 14 new files to check (updated today)</span> * * <span style="background-color: transparent;">UNIBE-LHEP</span> * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=117899"><span style="background-color: transparent;">117899</span></a><span style="background-color: transparent;"> (ATLAS) Storage dumps (on-hold)</span> ---++ Other topics * Topic1 * Topic2 Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: Dino * CMS: Fabio * ATLAS: Luis, Gianfranco * LHCb: * EGI: Gianfranco ---++ Action items * Item1
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
log
g07.201607.log
r1
manage
1.1 K
2016-08-04 - 11:25
LuisMarch
UniGe
- July 2016 stats
This topic: LCGTier2
>
WebHome
>
MeetingsBoard
>
MeetingSwissGridOperations20160804
Topic revision: r9 - 2016-11-11 - MichaelRolli
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback