Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup --> ---+ Swiss Grid Operations Meeting on 2015-12-10 * *Date and time*: 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 109305236) * *External link*: http://vidyoportal.cern.ch/flex.html?roomdirect.html&key=gDf6l4RlIAGN * *Phone gate*: From Switzerland: 0227671400 (portal) + 109305236 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) %TOC% ---++ Site status ---+++ CSCS * *Storage* * <span style="background-color: transparent;">dCache: stable but still have to run the cleaner manually. Upgrade to 2.10 will be performed on Wed 13th Jan 2016</span> * <span style="background-color: transparent;">Atlas: working on the monthly dumps</span> * <span style="background-color: transparent;">GPFS (scratch): nothing to report</span> * <span style="background-color: transparent;">New hardware: 4 server for dcache and ~1PB of storage. Working to move GPFS metadata disk on Flash based storage.</span> * *Compute* * <span style="background-color: transparent;"><em>Added some check function to nodehealtcheck:</em><br /></span> * <span style="background-color: transparent;">SWAP cleaner<br /></span> * <span style="background-color: transparent;">auto solve</span><span style="background-color: transparent;"> some blakhole scenarios like auto remount fs</span> * <span style="background-color: transparent;">after 60 + random number of days the node is putted in dreain for clean and reboot<br /></span> * Started some test with new slurm version, to migrate sltop. * Today we will order 40 new compute node with E5-2680v4 ---+++ PSI * Xxx ---+++ UNIBE-LHEP * *Operations* * ce01 cluster re-installation virtually completed (about 900 worker cores running, 120 still to be installed, 256 awaiting delivery) * Started with a simple slurm setup (slurm-15.08.1) in order to cut down on commissioning time: one partition with<br /> <verbatim>SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory MemLimitEnforce=no</verbatim> * We don't over-subscribe memory anymore: nodes don't starve and crash * Memory usage is properly accounted for in 15.08 (PSS): no jobs killed on (artificial) over-limit of "vmem" (now the full address space reserved by a process, no what's allocated or used) * Comparing job fail rates between ce01 and ce02 (still on old SGE) has convinced me to rush the re-installation of ce02 (started earlier today) * *ATLAS specific operations* * Stable worflows by ATLAS (very large improvement since beginning of run II) * Stuck with the implementation of monthly dumps of the namespace on the DPM SE: * headnode on SLC5: the dump script does not work and also generating a valid proxy is problematic * decided to push the re-deployment of the head node on SLC6 * legacy config tool (YAIM) no longer supported * puppet based configuration, got the right docs at the DPM workshop earlier this week in CERN * tests ongoing on a pps VM * also complicated by the fact my site-bdii is still co-located with the DPM head node * this will likely be the first task for 2016 ---+++ UNIBE-ID * Xxx ---+++ UNIGE * *Operations* * * atlasfs29.unige.ch : New certificate * Another File Server has been already installed, but this is for DAMPE experiment (no host certificate needed) * We have new hardware to be installed at the cluster: File Servers and a couple of PCs for services * We will install puppet for DPM and probably cluster configuration and setup: Let's say we will use a testbed with atlasfs29 + 1 PC of service (1 out of 2, of the previous ones mentioned just above) * *Network - Outlook* * We intend for a new network switch of 10 Gb/s, but this is still under negotiation * Most likely, it will be in the beggining of next year * *Storage* * There wass a DPM SE workshop at CERN on December 7th-8th: <a target="_blank" href="https://mmm.cern.ch/owa/redir.aspx?C=A5Ciw3Yy_0igvsnpDRi7YE_8ZVAfBtMISpSD7CmKwnmO8HN8bNwD0QTHlTviRdJd79RAAEH3jzI.&URL=https%3a%2f%2findico.cern.ch%2fevent%2f432642%2f">https://indico.cern.ch/event/432642/</a> * Checking the data stored at the DPM SE for cleaning purposes, since ATLAS requested it * Checking data in order to identify files which are registered in the catalogue (rucio), but not physically at the DPM SE and vice versa ---+++ NGI_CH * Nothing to report ---++ Other topics * Proposal to add to this meeting: T2 monthly pledge review (CSCS, UNIBE); GGUS open ticket review * Coverage over the holiday season * Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: * CMS: * ATLAS:Gianfranco,Luis March * LHCb: * EGI:Gianfranco ---++ Action items * Item1
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r6
<
r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r6 - 2015-12-10
-
LuisMarch
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback