Tags:
meeting
1
SwissGridOperationsMeeting
1
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup --> ---+ Swiss Grid Operations Meeting on 2016-06-02 at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598) * *External link*: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE * *Phone gate*: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) * *Switch Vidyo SIP IP*: 137.138.248.204 %TOC% ---++ Site status ---+++ CSCS * Xxx * Accounting numbers (from scheduler) from last month ---+++ PSI * Xxx * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][Accounting numbers (from scheduler) from last month]] ---+++ UNIBE-LHEP *Operations* * stable, no incidents to report *ATLAS specific operations* * 40% of ATLAS/CH WT, but 67% CPUtime in May (all jobs) - CSCS shows >60% FAILED WT [1] (most of them are "SIGTERM from the batch system" and "error in copying the file from job workdir to local SE" - will open a rt ticket to follow up on this) * DPM head node migration to SLC6 and ATLAS storage dumps still on hold *HammerCloud report [2]* * UNIBE-LHEP online >92% (last month). Better than previous month. Still room for improvement, but not too big impact since interruptions are not long enough to cause the site to drain. * UNIBE-ID >99% * UNIBE-LHEP_CLOUD* <90% (lost hearbeat from pilot: some intermittent network instabilities) [1] http://dashb-atlas-job.cern.ch/dashboard/request.py/consumptionsxml?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=CH-CHIPP-CSCS&resourcetype=All&activities=all&sitesSort=2&sitesCatSort=2&start=2016-05-01&end=2016-05-31&timeRange=daily&granularity=Monthly&generic=0&sortBy=0&series=All&type=gstb [2] http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistorywithstatistics?columnid=562&view=Shifter%20view#time=720&start_date=&end_date=&use_downtimes=false&merge_colors=false&sites=multiple&clouds=ND&site=UNIBE-LHEP,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_CLOUD,UNIBE-LHEP_CLOUD_MCORE,UNIBE-LHEP_MCORE * *Accounting numbers (from scheduler) from last month (May 2016)* ( includes ce03/CLOUD ) * WC h: 1211030 (ATLAS) - 23599 (t2k.org) - 282 (uboone) - 7 (ops) * *Accounting numbers (from ATLAS dashboard) from last month* (May 2016) * CPU h: 1194137 * WC h: 1358408 ---+++ UNIBE-ID * Smooth operation in general; no outages * Mitigation has been setup for high fail rate for ATALAS jobs (SIGKILL due to h_vmem violation) by increasing multiplier in submit-job-sge => decrease of fail rate but more resource waste. * Medium-term goal: Move from OG-SGE to Slurm (essentialy a matter of user acceptance, not a technical issue) * As previously announced, 2-day downtime next week: IB-Recabiling (8 => 16 spine switches); provisioning of 2160 cores (Broadwell) * Accounting number (from scheduler) from last month for ATLAS: * CPU h: 135'276 * WC h: 108'001 ---+++ UNIGE * Xxx * Accounting numbers (from scheduler) from last month ---+++ NGI_CH * WLCG plans to retire the requirement for sites to run a site-bdii. EGI sees it differently. Long ongoing discussion, including a WLCG Task Force assigned to this. Stay tuned, but don't hold your breath : -) * Heads up: current funding for the minimal NGI_CH operation layer (10%FTE) will end by end of year. Will need to identify a solution. Also open from end of the year are the EGI fee (hopefully it will go on Swing) and the certificates (~30kCHF including ~10% FTE for operation). Now not only strictly CHIPP uses certificates. * *NGI-CH Open Tickets review* 1 <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=120405">120405</a> for CSCS (LHCb) Red: "very urgent", last update on 2016-05-11. Reply awaited from site. 1 <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=117899">117899</a> for UNIBE-LHEP (ATLAS) On hold (ATLAS request- storage dumps) ---++ Other topics * Topic1 * Topic2 Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: * CMS: * ATLAS: apologies: Gianfranco (at NorduGrid 2016 conference), Nico Färber (UNIBE-ID) * LHCb: * EGI: apologies: Gianfranco (at NorduGrid 2016 conference) ---++ Action items * Item1
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r7
<
r6
<
r5
<
r4
<
r3
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r4 - 2016-06-02
-
MichaelRolli
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback