Tags:
meeting
1
SwissGridOperationsMeeting
1
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup --> ---+ Swiss Grid Operations Meeting on 2016-07-07 at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598) * *External link*: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE * *Phone gate*: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) * *Switch Vidyo SIP IP*: 137.138.248.204 %TOC% ---++ Site status ---+++ CSCS * Some accounting numbers<br /> <table cellspacing="0" border="0"> <colgroup width="70"></colgroup> <colgroup width="112" span="2"></colgroup> <colgroup width="58"></colgroup> <colgroup width="84"></colgroup> <colgroup width="185"></colgroup> <colgroup width="130"></colgroup> <colgroup width="138"></colgroup> <colgroup width="85"></colgroup> <tbody> <tr> <td align="left" height="17"> *account* </td> <td align="left"> *% num jobs* </td> <td align="left"> *% of wall* </td> <td align="left"> *count(*)*</td> <td align="left"> *walltime sec* </td> <td align="left"> *sum(round(max_vsize/1024))* </td> <td align="left"> *sum_tres_req_mem* </td> <td align="left"> *mem_diff* </td> <td align="left"> *%* </td> </tr> <tr> <td align="left" height="17">total:</td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">atlas</td> <td align="right">100.00%</td> <td align="right">100.00%</td> <td align="right">288913</td> <td align="right">2694614271</td> <td align="right">1,617,843,841</td> <td align="right">1,389,126,500</td> <td align="right">228,717,341</td> <td align="right">116.46%</td> </tr> <tr> <td align="left" height="17">cms</td> <td align="right">100.00%</td> <td align="right">100.00%</td> <td align="right">50840</td> <td align="right">1535630187</td> <td align="right">230,934,497</td> <td align="right">356,035,968</td> <td align="right">-125,101,471</td> <td align="right">64.86%</td> </tr> <tr> <td align="left" height="17">lhcb</td> <td align="right">100.00%</td> <td align="right">100.00%</td> <td align="right">57574</td> <td align="right">3211019505</td> <td align="right">255,594,384</td> <td align="right">115,148,000</td> <td align="right">140,446,384</td> <td align="right">221.97%</td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">req<=2000:</td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">atlas</td> <td align="right">68.50%</td> <td align="right">43.09%</td> <td align="right">197903</td> <td align="right">1160991397</td> <td align="right">547,848,230</td> <td align="right">386,762,000</td> <td align="right">161,086,230</td> <td align="right">141.65%</td> </tr> <tr> <td align="left" height="17">cms</td> <td align="right">74.38%</td> <td align="right">0.28%</td> <td align="right">37816</td> <td align="right">4244836</td> <td align="right">30,376,806</td> <td align="right">75,632,000</td> <td align="right">-45,255,194</td> <td align="right">40.16%</td> </tr> <tr> <td align="left" height="17">lhcb</td> <td align="right">100.00%</td> <td align="right">100.00%</td> <td align="right">57572</td> <td align="right">3210873808</td> <td align="right">255,585,171</td> <td align="right">115,144,000</td> <td align="right">140,441,171</td> <td align="right">221.97%</td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">req>2000:</td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">atlas</td> <td align="right">31.50%</td> <td align="right">56.91%</td> <td align="right">91007</td> <td align="right">1533609961</td> <td align="right">1,069,984,255</td> <td align="right">1,002,358,500</td> <td align="right">67,625,755</td> <td align="right">106.75%</td> </tr> <tr> <td align="left" height="17">cms</td> <td align="right">25.62%</td> <td align="right">99.72%</td> <td align="right">13024</td> <td align="right">1531385351</td> <td align="right">200,557,691</td> <td align="right">280,403,968</td> <td align="right">-79,846,277</td> <td align="right">71.52%</td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="right">0.00%</td> <td align="right">0.00%</td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> </tbody> </table> *Query used:* * <pre><div id="_mcePaste">SELECT account, count(*), sum(phoenix_job_table.time_end - phoenix_job_table.time_start) as walltime, sum(round(max_vsize/1024)),</div> <div id="_mcePaste">sum(substring_index(substring_index(phoenix_job_table.tres_req,',',2),'2=',-1)) as sum_tres_req_mem,</div> <div id="_mcePaste">sum(round(max_vsize/1024)) - sum(substring_index(substring_index(phoenix_job_table.tres_req,',',2),'2=',-1)) as mem_diff</div> <div id="_mcePaste">FROM slurm_acct_db.phoenix_step_table,slurm_acct_db.phoenix_job_table</div> <div id="_mcePaste">WHERE phoenix_job_table.job_db_inx = phoenix_step_table.job_db_inx</div> <div id="_mcePaste">and substring_index(substring_index(phoenix_job_table.tres_req,',',2),'2=',-1) > 2000</div><div id="_mcePaste">and account in ('atlas', 'cms', 'lhcb')</div> <div id="_mcePaste">and phoenix_step_table.state = 3</div> <div id="_mcePaste">group by account</div></pre> ---+++ PSI * Upgraded my 2 HP CentOS7 NFSv4 NAS to [[http://zfsonlinux.org/][ZoL v0.6.5.7]] * 1st is the primary NAS featuring 24 SAS disks 15k 600GB * 2ns is the secondary NAS featuring 12 SATA disks 7.2k 3000GB ( cold backup ) * both owns a dual 10Gb/s card put in LACP bonding mode * dCache on ZoL * again on the secondary NAS I made ZFS fs for dCache : * <pre>[root@t3nfs02 ~]# zfs list -d1 NAME USED AVAIL REFER MOUNTPOINT data01 1.33T 9.15T 32.0K /zfs/data01 data01/dcache 100G 9.15T 32.0K %BLUE%/zfs/data01/dcache%ENDCOLOR% data01/t3nfs01_data01 1.23T 9.15T 32.0K /zfs/data01/t3nfs01_data01 data02 4.33T 6.15T 32.0K /zfs/data02 data02/dcache 100G 6.15T 32.0K %BLUE%/zfs/data02/dcache%ENDCOLOR% data02/t3nfs01_data01 4.23T 6.15T 32.0K /zfs/data02/t3nfs01_data01 </pre> * dCache tuning * <pre>[root@t3se01 layouts]# grep max /etc/dcache/layouts/t3se01.conf srm.request.max-requests=400 srm.request.put.max-requests=100 srm.request.get.max-inprogress=100 srm.request.copy.max-inprogress=100 srm.request.max-transfers=100 </pre> * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][Accounting numbers (from scheduler) from last month]] ---+++ UNIBE-LHEP * Xxx * Accounting numbers (from scheduler) from last month ---+++ UNIBE-ID * Mostly smooth operation * Procurement: * 80 new server (76*20 + 4*16 => 1584 new cores; disontinued 144 cores (oldest nodes) * installed and provisioned * Migration from OGSGE => Slurm planned for Q4 * Probs with NAMD jobs (using ibverbs directly) => low level IB errors from mlx4 regarding qp * no errors with MPI jobs using ompi or the like * no errors with storage (GPFS over RDMA) * ATLAS specific: large number of random a-rex crashes within the last 2 weeks * reason unknown, happened 24x between 2016-06-15 and last monday; no crash since 3 days ---+++ UNIGE * Operations * 10 machines added into the batch system (80 cores) + 3 machines replaced: * DELL - Intel Xeon @ 2.4 GHz - with 8 cores and 48 GB of memory * RAID controller: Common problem for our DPM and NFS File servers (It happened like 3/4 times during last months) * Increased activity from DPNC users to run in the batch system (other groups, in addition to ATLAS) * Still not in ATLAS production, problems related with memory (hints provided by Gianfranco) * Data Management: * User datasets from UniGe for ATLASLOCALGROUPDISK at CSCS deleted (space can be moved to ATLASSCRATCHDISK) * Some problems for central deletion (fixed) - permissions related: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122024 * [[%ATTACHURL%/g07.2016.06.log][Accounting numbers (from scheduler) from last month]] ---+++ NGI_CH * Xxx * NGI-CH Open Tickets review ---++ Other topics * Topic1 * Topic2 Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: * CMS: * ATLAS: Michael Rolli (UNIBE-ID) => absent being ill, nevertheless some text above * LHCb: Roland Bernet * EGI: ---++ Action items * Item1
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
log
g07.2016.06.log
r1
manage
1.1 K
2016-07-07 - 11:05
LuisMarch
Accounting
UniGe
June 2016
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r16
<
r15
<
r14
<
r13
<
r12
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r14 - 2016-07-07
-
RolandBernet
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback