Tags:
meeting
1
SwissGridOperationsMeeting
1
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup --> ---+ Swiss Grid Operations Meeting on 2016-08-04 at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598) * *External link*: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE * *Phone gate*: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) * *Switch Vidyo SIP IP*: 137.138.248.204 %TOC% ---++ Site status ---+++ CSCS * Xxx * Accounting numbers (from scheduler) from last month <ins><span style="white-space: pre;"> </span> </ins> * Worked mainly on the issue about the GPFS slowness and lcb-cp problem * * GPFS Slowness is caused by I/O intensive jobs running simultaneously * LCB-CP deprecated command replaced by gfal-copy, changed on site conf by CMS and Atlas * lhcb is facing the same issue? * <span style="background-color: transparent;">Perfsonar01/02 dead for disc failure, both machines reinstalled with Puppet</span> * <span style="background-color: transparent;">cream[01-03] removed yesterday from BDII and GOCDB, so officially decommissioned. Cream01 and cream03 powerd off today</span> * <span style="background-color: transparent;">Reintalling BDII with puppet</span> Accounting numbers July: <div style="padding-left: 60px;" id="_mcePaste"> | *VO* | *Cpu Hours* | | cms | 1'793'900.165 | | atlas | 1'118'498.575 | | lhcb | 811'097.677 | | ops | 19.319 | | TOTAL | 3'723'519.013 | </div> ---+++ PSI * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][Accounting numbers (from scheduler) from last month]] * *New HW* * 3 Dalco UI * each featuring [ 128GB RAM, 2 * E5-2697v4 CPUs, 6*1.8TB 10k disks, 2*10GbE ] * 1 Storage, type [[http://www.netapp.com/us/products/storage-systems/e2700/e2700-tech-specs.aspx][NetApp E2760]] * [ 52*6TB disks + 8*400GB SSD ], 2 RAID controller SAS based * final net capacity ~200TB * [[https://www.netapp.com/us/media/ds-3395.pdf][NetApp SANtricity SSD Cache]] * [[https://www.netapp.com/us/media/ds-3309.pdf][NetApp SANtricity Dynamic Disk Pools]] * *GGUS Tickets vs CSCS* * Following [[https://xgus.ggus.eu/ngi_ch/?mode=ticket_info&ticket_id=485][Failures at T2_CH_CSCS]] * [[https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Storage/Plugins/GFAL2Impl.py#L55][CMS Job gfal-copy call]] activated because of my recent change [[https://gitlab.cern.ch/SITECONF/T2_CH_CSCS/commit/5675c5963d5bde486abe3d02734e14bb23166308][command value="gfal2"]] * <pre>$ find /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/ /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/ /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/PhEDEx /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/PhEDEx/storage.xml /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/JobConfig /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/JobConfig/cmsset_local.sh /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/JobConfig/cmsset_local.csh /cvmfs/cms.cern.ch/SITECONF/T2_CH_CSCS/JobConfig/site-local-config.xml <---- </pre> * *Holidays* * Previous week I was on leave, next week I'll be on leave too ---+++ UNIBE-LHEP * <p> *Operations* </p> * Nothing specific to report * *ATLAS specific operations* * Nothing specific to report * * *HammerCloud report [1]* <span style="background-color: transparent;"> </span> * UNIBE-LHEP online 74% (was 79% last month). * UNIBE-ID 97% (this doesn't run the high I/O workloads, but it runs analysis) * UNIBE-LHEP_CLOUD* 95% <span style="background-color: transparent;">[1] </span><a target="_top" href="http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics#time=720&start_date=&end_date=&use_downtimes=false&merge_colors=false&sites=multiple&clouds=ND&site=UNIBE-LHEP,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_CLOUD,UNIBE-LHEP_CLOUD_MCORE,UNIBE-LHEP_MCORE?columnid=562&view=Shifter%20view">http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562&view=Shifter%20view#time=720&start_date=&end_date=&use_downtimes=false&merge_colors=false&sites=multiple&clouds=ND&site=UNIBE-LHEP,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_CLOUD,UNIBE-LHEP_CLOUD_MCORE,UNIBE-LHEP_MCORE</a> * *ATLAS resource delivery UNIBE-LHEP vs CSCS-LCG2 [2]* * <span style="background-color: transparent;">All jobs: 47% of ATLAS/CH (</span><a rel="nofollow" href="https://wiki.chipp.ch/twiki/bin/edit/LCGTier2/WallTime?topicparent=LCGTier2.MeetingSwissGridOperations20160707;nowysiwyg=0" title="WallTime (this topic does not yet exist; you can create it)"><span style="background-color: transparent;">WallTime</span></a><span style="background-color: transparent;">), 78% of ATLAS/CH (CPUtime)</span> * <span style="background-color: transparent;">Good jobs: 68% of ATLAS CH (</span><a rel="nofollow" href="https://wiki.chipp.ch/twiki/bin/edit/LCGTier2/WallTime?topicparent=LCGTier2.MeetingSwissGridOperations20160707;nowysiwyg=0" title="WallTime (this topic does not yet exist; you can create it)"><span style="background-color: transparent;">WallTime</span></a><span style="background-color: transparent;">), 84% of ATLAS/CH (CPUtime)</span> [2] <a target="_top" href="http://dashb-atlas-job-prototype.cern.ch/dashboard/request.py/dailysummary#button=cpuconsumption&sites%5B%5D=CSCS-LCG2&sites%5B%5D=UNIBE-LHEP&sitesCat%5B%5D=All+Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2016-07-01&end=2016-07-31&timerange=daily&granularity=Monthly&generic=0&sortby=0&series=All">http://dashb-atlas-job-prototype.cern.ch/dashboard/request.py/dailysummary#button=cpuconsumption&sites%5B%5D=CSCS-LCG2&sites%5B%5D=UNIBE-LHEP&sitesCat%5B%5D=All+Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2016-06-01&end=2016-06-30&timerange=daily&granularity=Monthly&generic=0&sortby=0&series=All</a> <li style="padding-left: 30px;"> *Accounting numbers (from scheduler) for last month (Jul 2016)* (includes ce03/CLOUD)</li> <li style="padding-left: 60px;">WC h: 780748 (ATLAS) - 35044 (t2k.org) - 3289 (uboone) - 12 (ops)</li> ---+++ UNIBE-ID * *Change of Resource Manager:* * ATLAS (ARC-CE) now served by new Slurm server * Transition was easy enough, minor quirks in the first couple of hours due to forgotten change to singlenode environment * Since then stable operation * Rest of the cluster will be moved to Slurm in next maintenance down (2nd Thursday of December) => moew cores again for ATLAS * after OG-SGE dumped * *Operations* * Very stable operations lately ---+++ UNIGE * *Operations* * Back into ATLAS production mode since July 25th: * Memory hacked at PBS batch scheduler for running ATLAS production jobs * Debugging Multi-Core jobs: Not running successfully yet * Running smoothly: Lower user activity due to holidays period * <span style="background-color: transparent;"> *Network* </span> * <span style="background-color: transparent;">Upgrade of network swicth (10 Gb/s) for File Systems soon</span> * *Holidays* * Next 2 weeks * <span style="font-family: Verdana, Arial, Helvetica, sans-serif; color: blue; background-color: transparent; text-decoration: underline;">[[https://wiki.chipp.ch/twiki/pub/LCGTier2/MeetingSwissGridOperations20160804/g07.201607.log][Accounting numbers (from scheduler) from last month]]</span> ---+++ NGI_CH * <span style="background-color: transparent;"><span style="font-family: Helvetica; font-size: 11px; line-height: normal;"><strong>EGI central monitoring instance (ARGO)</strong></span><br /><br /><span style="font-family: Helvetica; font-size: 11px; line-height: normal;">Since July 1st, the EGI infrastructure is being monitored by two monitoring instances that can be found on these addresses:</span><br /><br /> https://argo-mon.egi.eu/nagios <br /> https://argo-mon2.egi.eu/nagios <br /><br /><span style="font-family: Helvetica; font-size: 11px; line-height: normal;">Both instances are running the same set of tests and results provided are equivalent.</span><br /><br /><span style="font-family: Helvetica; font-size: 11px; line-height: normal;">Starting from the same date, the central ARGO Web UI (</span> http://argo.egi.eu/lavoisier <span style="font-family: Helvetica; font-size: 11px; line-height: normal;"> ) provides information from these two instances and the Operations Portal was reconfigured to raise alarms based on information from ARGO central instances.</span></span> <span style="font-family: Helvetica;"><span style="font-size: 11px;"> </span></span> * <span style="background-color: transparent;">NGI-CH Open Tickets review</span> * <span style="font-family: Verdana, Arial, Helvetica, sans-serif; color: #000000;"><span style="white-space: normal;">CSCS</span></span> * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=122679"><span style="background-color: transparent;">122679</span></a><span style="background-color: transparent;"> (CMS) timeout in file copy to SE (switch to gfal-copy broke some Nagios tests?)</span> * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=122486"><span style="background-color: transparent;">122486</span></a><span style="background-color: transparent;"> (ATLAS) expose the full PFN through their xrootd doors => just closed it</span> * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=122155"><span style="background-color: transparent;">122155</span></a><span style="background-color: transparent;"> (ATLAS) file transfers failing (inconsistent file size & checksum): 14 new files to check (updated today)</span> * * <span style="background-color: transparent;">UNIBE-LHEP</span> * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=117899"><span style="background-color: transparent;">117899</span></a><span style="background-color: transparent;"> (ATLAS) Storage dumps (on-hold)</span> ---++ Other topics * Topic1 * Topic2 Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: Dino * CMS: Fabio * ATLAS: Luis, Gianfranco * LHCb: * EGI: Gianfranco ---++ Action items * Item1
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
log
g07.201607.log
r1
manage
1.1 K
2016-08-04 - 11:25
LuisMarch
UniGe
- July 2016 stats
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r9
<
r8
<
r7
<
r6
<
r5
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r9 - 2016-11-11
-
MichaelRolli
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback