Tags:
meeting
1
SwissGridOperationsMeeting
1
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Swiss Grid Operations Meeting on 2013-07-04 * *Date and time*: First Thursday of the month, at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 9227296) * *External link*: http://vidyoportal.cern.ch/flex.html?roomdirect.html&key=Nrq24qRR4V1u * *Phone gate*: From Switzerland: 0225330322 (portal) + 9227296 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) ---++ Agenda Status * CSCS (reports Miguel): * All workernodes updated to SL6/ UMD 2 * No longer using fakeraid, one disk for OS one for CVMFS * Problems with gridftp transfers from certain sites, cause was our IB/Ethernet bridge not negotiating MTUs correctly * atlasvobox decommissioned * cmsvobox was found hung and would no longer boot under Xen. Migrated to KVM and brought machine back up. * lrms02 moved to KVM, no more Xen machines * Cream machines updated to latest release * PSI (reports Fabio/Daniel): * Generally quiet (holiday period); but some new users that need support and/or additional software packages installed * Cluster was "offline" for about 2 hours on Thursday June 26th (SWITCHlan network issue; [[http://www.switch.ch/de/network/operation/tts/index.html?action=show&id=726][ticket]] that left PSI without any network connection) * Virtual infrastructure at PSI seems to have stabilized (at least we did not see an other problems with our crucial dCache Chimera VM) * Usual fileserver/HDD problems continue; luckily everything so far was recoverable by reboots only (i.e. no data migrations necessary) * Our Chimera DB constantly has >250 connections open (out of 300 we have configured as a maximum) * The numbers seems to be almost constant; independent of actual usage * Maybe it's just trying to improve performance as best as possible within the defined limit or maybe there is something wrong with our configuration/installation * Situation not yet exactly understood (however, as it constantly stays <~90% this is not a high-priority issue for us) * Constantly "fighting" over-usage and clean-up laziness by certain users; our SE is now ~95% full * Started doing some tests with OpenMPI (so far single node only) * UNIBE (reports Gianfranco): * Xxx * UNIGE (reports Szymon): * Xxx * UZH (reports Sergio): * Xxx * Switch (reports Alessandro): * Xxx Other topics * CMS * CMS site configuration was migrated to git Wednesday June 26th. A trivial error in the migration script lead to a lot of killed jobs everywhere within a few hours that day; so if you saw something on that day for CMS this was most probably not a site issue. * From the CMS side the main issue of the last month really was the network problems that started mid-June (see e.g. on this [[https://cmsweb.cern.ch/phedex/graphs/quality_all?link=src&no_mss=true&to_node=CSCS&from_node=.%2A&conn=Debug%2FWebSite&starttime=1370649600&span=86400&endtime=1371859200][transfer quality plot]] that 3 links to T1 got bad around June 13th) * After some painstaking investigation we found out it was an MTU/MSS issue that is now temporarily solved by setting =ifconfig ib0 mtu 2044= on all dCache head nodes (I guess more details can/will be discussed in the CSCS part) * Do we also *need to apply this fix on WNs*? Otherwise stage-out from the WNs to e.g. FNAL could also fail * Transfers have resumed Monday evening and from our side it looks ok now * Did *something change in the Scheduled Downtime workflow* from our side? For about two months now CMS seems to be unable to detect those downtimes correctly; also, we do not receive the automated start/finish e-mails from GOCDB anymore. * Successfully [[https://savannah.cern.ch/support/?138461][registered the CSCS queues as SL6 queues]] within the CMS submitting infrastructure - waiting for first results * =cmsvobox= was down over the last weekend and was then migrated to a KVM machine; will try to migrate the last service that runs there within the next days * Topic2 Next meeting date: AOB ---++ Attendants * CSCS: * CMS: * ATLAS: * LHCb: * EGI: ---++ Action items * Item1
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r6
<
r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r4 - 2013-07-04
-
GeorgeBrown
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback