Tags:
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup --> ---+ Swiss Grid Operations Meeting on 2016-02-04 at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 109305236) * *External link*: http://vidyoportal.cern.ch/flex.html?roomdirect.html&key=gDf6l4RlIAGN * *Phone gate*: From Switzerland: 0227671400 (portal) + 109305236 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) %TOC% ---++ Site status ---+++ CSCS * <strong>STORAGE</strong><br /><br /><strong>Hardware / Physical install</strong><br />- 8 Feb: new dCache servers (4x)<br />- 8 Feb: MPO in order to connect Phoenix to the CSCS SAN<br />- 9 Feb: NETAPP E5660 (~0.5PB)<br /><br /><strong>dCache</strong><br />- The ‘cleaner problem’ (mainly affecting CMS) is no more present. Space is freed automatically as expected<br />- Atlas dumps in place, something to adjust for 'atlasgroupdisk/perf-egamma' and 'atlasscratchdisk’ ( https://xgus.ggus.eu/ngi_ch/index.php?mode=ticket_info&ticket_id=428 )<br /><br /><strong>GPFS</strong><br />- Unplanned maintenance was needed on Wed 3rd Feb in order to recreate the filesystem because of a metadata inconsistency problem. * <span style="background-color: transparent;"> *Systems* </span> <div style="padding-left: 60px;" id="_mcePaste">- Preparing and consolidating racks for new arrivals end of this month</div> <div style="padding-left: 60px;" id="_mcePaste">- Checking published values of HEPspec</div> <div style="padding-left: 60px;" id="_mcePaste">- Tuned slurm config to improove cluster performance</div> <div style="padding-left: 60px;" id="_mcePaste">- Fixed two HP nodes, one of them whit IB failures and the other the 1G man network card</div> <div style="padding-left: 60px;" id="_mcePaste">- Testing complete Puppet installation for worker nodes, is working fine, i have just to check some cvmfs parameters and cream wrapper script.</div> * Accounting numbers (from scheduler) from last month * http://ganglia.lcg.cscs.ch/ganglia/SLURM_REPORTS/phoenix_slurm_report_201601.txt ---+++ PSI * Xxx * Accounting numbers (from scheduler) from last month ---+++ UNIBE-LHEP *Operations* * Nothing significant to report; stable operation on both systems * 256 new cores delivered yesterday, hope to deploy before weekend *ATLAS specific operations* * <span style="background-color: transparent;">No progress on the storage dumps requested by ATLAS (due to no progress in the re-deployment of the DPM head node on SLC6)</span> * <span style="background-color: transparent;">ANALY_UNIBE-LHEP blacklisted in HC: no time to debug but low impact since right now ANALY jobs aren't too many</span> * <span style="background-color: transparent;">A couple of stabile weeks of operation for UNIBE-LHEP_CLOUD_MCORE, then we lost the cluster and could not fix it yet</span> <strong>Accountin</strong><strong>g</strong> * Accounting numbers (from scheduler) from last month (Jan 2016) * CPU h: 792492 (ATLAS) - 12671 (t2k.org) - 1879 (uboone) - 25 (ops) * <span style="background-color: transparent;">Accounting numbers (from ATLAS dashboard) from last month (Jan 2016)</span> * CPU h: 662466 (774848 with cloud) * WC h: 679368 (796292 with cloud) ---+++ UNIBE-ID * Xxx * <span style="background-color: transparent;">Accounting numbers (from scheduler) from last month</span> ---+++ UNIGE *Operations* * Running smoothly: Higher user activity since last meeting * Grid (ATLAS) jobs: UNIGE-DPNC in "Test" status and ~ 1/3 oj jobs failed due to (apparently) "ran out of memory". Need checks * We plan a scheduled downtime at some point: Needed for upgrading system and security (related to get involved for ATLAS production also) *Storage* * Dump of DPM SE for ATLAS experiment finally submitted (this dump should be provided once a month) * In addition to these ATLAS checks, we should clean our DPM: Old user data and other projects (To Be Done) *Outlook* * Request for new network switch upgrade to 10 Gb/s + adquisition of 3 GPUs already submitted (wait for resolution in ~ March 2016) * GPU info (nvidia): http://www.microspot.ch/msp/fr/pc-komponenten/grafikkarten/gainward-geforce-gtx-980-grafikkarten-gf-gtx-9-0000948922 * Install puppet for DPM SE (and probably also for cluster configuration and setup, replacing yaim) *Accounting* * Accounting numbers (from scheduler) from last month ---+++ NGI_CH * Nothing to report * NGI-CH Open Tickets review https://ggus.eu/index.php?mode=ticket_search&supportunit=NGI_CH&status=open&timeframe=any&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO * * CSCS-LCG2 * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=117786">117786</a> (ATLAS: storage dumps) almost done - should fix two paths * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=119021">119021</a> (LHCb team: jobs failed) no information provided - changed to "waiting for reply" * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=119171">119171</a> (CMS: Workflow failures) in progress * UNIBE-LHEP * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=117899">117899</a> (ATLAS: storage dumps) on hold * NGI_CH * <a target="_blank" href="https://ggus.eu/index.php?mode=ticket_info&ticket_id=118922">118922</a> (affects CSCS-LCG2 and UNIBE-LHEP): GlueSubClusterPhysicalCPUs, GlueSubClusterLogicalCPUs in the bdii - added explicit notification to CSCS-LCG2 ---++ Other topics * Topic1 * Topic2 Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: * CMS: * ATLAS: Luis March * LHCb: * EGI: Luis March ---++ Action items * Item1
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r12
<
r11
<
r10
<
r9
<
r8
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r11 - 2016-02-04
-
LuisMarch
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback