<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.ChippComputingBoardGroup --> ---+ Swiss Grid Operations Meeting on 2016-07-07 at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 10537598) * *External link*: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=FAEn4zjAba7BqoQ11TGZu66VSDE * *Phone gate*: From Switzerland: 0227671400 (portal) + 10537598 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) * *Switch Vidyo SIP IP*: 137.138.248.204 %TOC% ---++ Site status ---+++ CSCS * Some accounting numbers<br /> <table cellspacing="0" border="0"> <colgroup width="70"></colgroup> <colgroup width="112" span="2"></colgroup> <colgroup width="58"></colgroup> <colgroup width="84"></colgroup> <colgroup width="185"></colgroup> <colgroup width="130"></colgroup> <colgroup width="138"></colgroup> <colgroup width="85"></colgroup> <tbody> <tr> <td align="left" height="17"> *account* </td> <td align="left"> *% num jobs* </td> <td align="left"> *% of wall* </td> <td align="left"> *count(*)*</td> <td align="left"> *walltime sec* </td> <td align="left"> *sum(round(max_vsize/1024))* </td> <td align="left"> *sum_tres_req_mem* </td> <td align="left"> *mem_diff* </td> <td align="left"> *%* </td> </tr> <tr> <td align="left" height="17">total:</td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">atlas</td> <td align="right">100.00%</td> <td align="right">100.00%</td> <td align="right">288913</td> <td align="right">2694614271</td> <td align="right">1,617,843,841</td> <td align="right">1,389,126,500</td> <td align="right">228,717,341</td> <td align="right">116.46%</td> </tr> <tr> <td align="left" height="17">cms</td> <td align="right">100.00%</td> <td align="right">100.00%</td> <td align="right">50840</td> <td align="right">1535630187</td> <td align="right">230,934,497</td> <td align="right">356,035,968</td> <td align="right">-125,101,471</td> <td align="right">64.86%</td> </tr> <tr> <td align="left" height="17">lhcb</td> <td align="right">100.00%</td> <td align="right">100.00%</td> <td align="right">57574</td> <td align="right">3211019505</td> <td align="right">255,594,384</td> <td align="right">115,148,000</td> <td align="right">140,446,384</td> <td align="right">221.97%</td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">req<=2000:</td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">atlas</td> <td align="right">68.50%</td> <td align="right">43.09%</td> <td align="right">197903</td> <td align="right">1160991397</td> <td align="right">547,848,230</td> <td align="right">386,762,000</td> <td align="right">161,086,230</td> <td align="right">141.65%</td> </tr> <tr> <td align="left" height="17">cms</td> <td align="right">74.38%</td> <td align="right">0.28%</td> <td align="right">37816</td> <td align="right">4244836</td> <td align="right">30,376,806</td> <td align="right">75,632,000</td> <td align="right">-45,255,194</td> <td align="right">40.16%</td> </tr> <tr> <td align="left" height="17">lhcb</td> <td align="right">100.00%</td> <td align="right">100.00%</td> <td align="right">57572</td> <td align="right">3210873808</td> <td align="right">255,585,171</td> <td align="right">115,144,000</td> <td align="right">140,441,171</td> <td align="right">221.97%</td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">req>2000:</td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> <tr> <td align="left" height="17">atlas</td> <td align="right">31.50%</td> <td align="right">56.91%</td> <td align="right">91007</td> <td align="right">1533609961</td> <td align="right">1,069,984,255</td> <td align="right">1,002,358,500</td> <td align="right">67,625,755</td> <td align="right">106.75%</td> </tr> <tr> <td align="left" height="17">cms</td> <td align="right">25.62%</td> <td align="right">99.72%</td> <td align="right">13024</td> <td align="right">1531385351</td> <td align="right">200,557,691</td> <td align="right">280,403,968</td> <td align="right">-79,846,277</td> <td align="right">71.52%</td> </tr> <tr> <td align="left" height="17"><br /></td> <td align="right">0.00%</td> <td align="right">0.00%</td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> <td align="left"><br /></td> </tr> </tbody> </table> *Query used:* * <pre><div id="_mcePaste">SELECT account, count(*), sum(phoenix_job_table.time_end - phoenix_job_table.time_start) as walltime, sum(round(max_vsize/1024)),</div> <div id="_mcePaste">sum(substring_index(substring_index(phoenix_job_table.tres_req,',',2),'2=',-1)) as sum_tres_req_mem,</div> <div id="_mcePaste">sum(round(max_vsize/1024)) - sum(substring_index(substring_index(phoenix_job_table.tres_req,',',2),'2=',-1)) as mem_diff</div> <div id="_mcePaste">FROM slurm_acct_db.phoenix_step_table,slurm_acct_db.phoenix_job_table</div> <div id="_mcePaste">WHERE phoenix_job_table.job_db_inx = phoenix_step_table.job_db_inx</div> <div id="_mcePaste">and substring_index(substring_index(phoenix_job_table.tres_req,',',2),'2=',-1) > 2000</div><div id="_mcePaste">and account in ('atlas', 'cms', 'lhcb')</div> <div id="_mcePaste">and phoenix_step_table.state = 3</div> <div id="_mcePaste">group by account</div></pre> ---+++ PSI * Upgraded my 2 HP CentOS7 NFSv4 NAS to [[http://zfsonlinux.org/][ZoL v0.6.5.7]] * 1st is the primary NAS featuring 24 SAS disks 15k 600GB * 2ns is the secondary NAS featuring 12 SATA disks 7.2k 3000GB ( cold backup ) * both owns a dual 10Gb/s card put in LACP bonding mode * dCache on ZoL * again on the secondary NAS I made ZFS fs for dCache : * <pre>[root@t3nfs02 ~]# zfs list -d1 NAME USED AVAIL REFER MOUNTPOINT data01 1.33T 9.15T 32.0K /zfs/data01 data01/dcache 100G 9.15T 32.0K %BLUE%/zfs/data01/dcache%ENDCOLOR% data01/t3nfs01_data01 1.23T 9.15T 32.0K /zfs/data01/t3nfs01_data01 data02 4.33T 6.15T 32.0K /zfs/data02 data02/dcache 100G 6.15T 32.0K %BLUE%/zfs/data02/dcache%ENDCOLOR% data02/t3nfs01_data01 4.23T 6.15T 32.0K /zfs/data02/t3nfs01_data01 </pre> * dCache tuning * <pre>[root@t3se01 layouts]# grep max /etc/dcache/layouts/t3se01.conf srm.request.max-requests=400 srm.request.put.max-requests=100 srm.request.get.max-inprogress=100 srm.request.copy.max-inprogress=100 srm.request.max-transfers=100 </pre> * [[http://t3mon.psi.ch/ganglia/PSIT3-custom/accounting.txt][Accounting numbers (from scheduler) from last month]] ---+++ UNIBE-LHEP * Xxx * Accounting numbers (from scheduler) from last month ---+++ UNIBE-ID * Mostly smooth operation * Procurement: * 80 new server (76*20 + 4*16 => 1584 new cores; disontinued 144 cores (oldest nodes) * installed and provisioned * Migration from OGSGE => Slurm planned for Q4 * Probs with NAMD jobs (using ibverbs directly) => low level IB errors from mlx4 regarding qp * no errors with MPI jobs using ompi or the like * no errors with storage (GPFS over RDMA) * ATLAS specific: large number of random a-rex crashes within the last 2 weeks * reason unknown, happened 24x between 2016-06-15 and last monday; no crash since 3 days ---+++ UNIGE * Operations * 10 machines added into the batch system (80 cores) + 3 machines replaced: * DELL - Intel Xeon @ 2.4 GHz - with 8 cores and 48 GB of memory * RAID controller: Common problem for our DPM and NFS File servers (It happened like 3/4 times during last months) * Increased activity from DPNC users to run in the batch system (other groups, in addition to ATLAS) * Still not in ATLAS production, problems related with memory (hints provided by Gianfranco) * Data Management: * User datasets from UniGe for ATLASLOCALGROUPDISK at CSCS deleted (space can be moved to ATLASSCRATCHDISK) * Some problems for central deletion (fixed) - permissions related: https://ggus.eu/index.php?mode=ticket_info&ticket_id=122024 * [[%ATTACHURL%/g07.2016.06.log][Accounting numbers (from scheduler) from last month]] ---+++ NGI_CH * Xxx * NGI-CH Open Tickets review ---++ Other topics * Topic1 * Topic2 Next meeting date: ---++ A.O.B. ---++ Attendants * CSCS: * CMS: * ATLAS: Michael Rolli (UNIBE-ID) => absent being ill, nevertheless some text above * LHCb: Roland Bernet * EGI: ---++ Action items * Item1
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
log
g07.2016.06.log
r1
manage
1.1 K
2016-07-07 - 11:05
LuisMarch
Accounting
UniGe
June 2016
This topic: LCGTier2
>
WebHome
>
MeetingsBoard
>
MeetingSwissGridOperations20160707
Topic revision: r14 - 2016-07-07 - RolandBernet
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback