Tags:
meeting1Add my vote for this tag create new tag
view all tags

6th T3 Steering board pre-meeting ( Virtual )

Followup

SteerBoardMeeting07

Meeting Facts

Possible HW investments by Nov 2015

We foresee three possible HW investments ; every Institute representative will have to express and motivate his choice :

  1. Replacing the ~200TB by a [ 4TB*60 disks ] scalable once storage for ~60k CHF ; or even by a [ 4TB*60 + 4TB*60 ] already scaled storage for ~110kCHF ; 5y warranty in both cases
  2. Replacing the ~200TB by a [ 4TB*60 disks ] unscalable storage for ~40k CHF ( price to be confirmed ) AND Buying 192 CPUs cores for ~25k CHF
  3. NOT Replacing at all the ~200TB AND Buying 192 + 192 CPUs cores for ~50k CHF

Meeting Minutes

We discussed the short term tier-3 upgrade options, and concluded:
  1. There is no point in immediately spending the funds of 45 kCHF (remaining from the 2015 SNF grant) for HW acquisition. However, we need to spend it early enough to be able to adapt to possible procurement delays.
  2. We will soon (in Nov/beg. Dec) held a CMS Tier-3 steering board meeting on order to fix the long-term strategy for the Tier-3, i.e. for the next 2-3 years.Once we have established and agreed on those requirements for the next years, we'll continue the process of securing the funding from UZH, possibly also from PSI and ETHZ.
  3. We expect funding contributions at the level of 75 kCHF from UZH by end 15/beg.16.; and at the level of 100 kCHF from SNF by spring 2016.
  4. The available combined funding (left over from SNF 2015 and UZH contribution) will be invested into the upgrade of tier-3 by Feb latest, according to the agreed upon strategy.
  5. Until then, we continue to operate the present storage and CPU as is.

UZH slide

UZH slide

Attendants

  • Christoph Graph (CHIPP Computing Board chair, ETHZ), Daniel Meister ( ETHZ, T3 admin)
  • Urs Langenegger (PSI), Danek Kotlinski (PSI)
  • Clemens Lange (UZH), Lea Caminada (UZH)
  • Fabio Martinelli (T3 admin)
  • Derek Feichtinger (chair, T3 admin) Absent









T3 status ( to be read offline by each meeting attendant )

CPUs

HW

The 10 AMD servers were recycled from CSCS

No. of WNs Processors Cores/node HS06/node HS06/core kCINT2000/core total No. of Cores total HS06 total kCINT2000
20 2*Xeon X5560 8 117.53 14.69 3.77 160 2350 603
11 2*E5-2670 2.60GHz 16 263 16.44 4.22 176 2893 743
4 2*AMD 6272 2.40GHz 32 241 7.53 1.93 128 964 247
35           464 6207 1593

No. of UIs Processors Cores/node HS06/node HS06/core kCINT2000/core total No. of Cores total HS06 total kCINT2000
6 2*AMD 6272 2.40GHz 32 241 7.53 1.93 192 1446 371
6           192 1446 371

43 Submitting Users - last year - list

More... Close

ethz-ecal ethz-ewk ethz-higgs ethz-susy psi-bphys psi-pixel uniz-higgs uniz-pixel
micheli amarini bianchi casal ursl kotlinski aspiezia grauco
mquittna chanon dmeister pandolf wiederkehr_s   cgalloni yangyong
peruzzi   gregor cheidegg     clange  
    jpata gaperrin     dsalerno  
    mameinha jhoss     hinzmann  
    musella mangano     jngadiub  
    nchernya mdunser     leac  
    peller mmarionn     mwang  
    perrozzi mmasciov     taroni  
    pgras mschoene     thaarres  
    tklijnsm       vlambert  
    vtavolar          

43 Submitting Users - last year - CPUs usage by user

More... Close

user SUM_WALL_days SUM_CPU_days
ursl 4853 4453
clange 4230 3994
mwang 3466 3427
jngadiub 2281 1574
cgalloni 1746 1246
mmasciov 1487 1432
tklijnsm 1255 1209
dsalerno 1126 1049
hinzmann 1084 991
peruzzi 958 858
thaarres 819 628
leac 802 648
cheidegg 640 478
peller 496 541
jpata 441 355
yangyong 406 376
casal 387 362
bianchi 381 373
amarini 281 195
chanon 279 218
mquittna 245 206
martinelli_f 155 5
wiederkehr_s 105 87
pgras 58 54
aspiezia 58 64
jhoss 48 19
perrozzi 40 26
vlambert 34 33
gaperrin 33 24
kotlinski 32 26
nchernya 31 24
mschoene 25 19
mdunser 23 7
mameinha 18 14
mangano 7 0
taroni 5 4
gregor 4 3
micheli 4 0
grauco 3 3
pandolf 1 0
mmarionn 1 1
musella 1 1
vtavolar 0 0
cmssgm 0 0
dmeister 0 0

43 Submitting Users - last year - CPUs usage by t3group

More... Close

t3group SUM_WALL_days SUM_CPU_days
uniz-higgs 15650 13659
psi-bphys 4957 4540
ethz-higgs 2726 2599
ethz-susy 2650 2342
ethz-ecal 1207 1064
ethz-ewk 560 413
uniz-pixel 408 379
cms 155 5
psi-pixel 32 26

43 Submitting Users - last year - CPUs usage by year-month

More... Close

period SUM_WALL_days SUM_CPU_days
2014-10 390 324
2014-11 1671 1546
2014-12 3323 3259
2015-01 1796 1498
2015-02 2691 2250
2015-03 2791 2548
2015-04 1085 884
2015-05 797 652
2015-06 1818 1699
2015-07 2014 1692
2015-08 2906 2520
2015-09 2423 2219
2015-10 4643 3937

43 Submitting Users - last 6 months - RAMs ranges by user

More... Close

username RAM_RANGE JOBs
bianchi >3GB 787
casal 0GB-1GB 315
casal 1GB-2GB 1658
casal 2GB-3GB 1215
casal >3GB 99
cgalloni 0GB-1GB 63964
cgalloni 1GB-2GB 4461
cgalloni 2GB-3GB 284
cgalloni >3GB 646
cheidegg 0GB-1GB 248
cheidegg 1GB-2GB 62
cheidegg 2GB-3GB 17
cheidegg >3GB 4
clange 0GB-1GB 42555
clange 1GB-2GB 261
clange 2GB-3GB 59911
clange >3GB 201
cmssgm 0GB-1GB 3
dmeister 0GB-1GB 2
dsalerno 0GB-1GB 306
dsalerno 1GB-2GB 1332
dsalerno 2GB-3GB 861
gaperrin 0GB-1GB 1950
gaperrin 1GB-2GB 465
gaperrin 2GB-3GB 207
gaperrin >3GB 1664
grauco 0GB-1GB 4
grauco 1GB-2GB 15
grauco 2GB-3GB 1
grauco >3GB 1
gregor 0GB-1GB 758
gregor 1GB-2GB 3824
gregor 2GB-3GB 1
hinzmann 0GB-1GB 16244
hinzmann 1GB-2GB 1987
hinzmann 2GB-3GB 38
hinzmann >3GB 5543
jhoss 0GB-1GB 710
jhoss 1GB-2GB 28
jngadiub 0GB-1GB 97284
jngadiub 1GB-2GB 17646
jngadiub 2GB-3GB 175
jngadiub >3GB 3170
jpata 0GB-1GB 24801
jpata 1GB-2GB 227
jpata 2GB-3GB 4760
jpata >3GB 247
kaestli 0GB-1GB 6
kaestli 1GB-2GB 1
kaestli >3GB 1
kotlinski 0GB-1GB 98
kotlinski 1GB-2GB 661
kotlinski 2GB-3GB 1825
kotlinski >3GB 25
leac 0GB-1GB 89
leac 1GB-2GB 477
leac 2GB-3GB 5090
leac >3GB 676
martinelli_f 0GB-1GB 81206
martinelli_f >3GB 255
micheli 1GB-2GB 18
mmarionn 0GB-1GB 2864
mmasciov 0GB-1GB 21590
mmasciov 1GB-2GB 16001
mmasciov 2GB-3GB 73350
mmasciov >3GB 9582
mquittna 0GB-1GB 1193
mquittna 1GB-2GB 1094
mschoene 0GB-1GB 154
mschoene 1GB-2GB 864
mschoene >3GB 11
musella 0GB-1GB 306
musella 1GB-2GB 376
mwang 0GB-1GB 8061
mwang 1GB-2GB 38
mwang 2GB-3GB 473
mwang >3GB 778
nchernya 0GB-1GB 58145
nchernya 1GB-2GB 25
perrozzi 0GB-1GB 22066
perrozzi 1GB-2GB 28
perrozzi 2GB-3GB 19
perrozzi >3GB 379
pgras >3GB 50
thaarres 0GB-1GB 195869
thaarres 1GB-2GB 6805
thaarres 2GB-3GB 18
thaarres >3GB 686
thea 0GB-1GB 25
thea 1GB-2GB 12
thea >3GB 8
tklijnsm 0GB-1GB 3125
tklijnsm 1GB-2GB 983
tklijnsm 2GB-3GB 54794
tklijnsm >3GB 1890
ursl 0GB-1GB 1721
ursl 1GB-2GB 5554
ursl 2GB-3GB 31286
ursl >3GB 4871
wiederkehr_s 0GB-1GB 2191
wiederkehr_s 1GB-2GB 361
wiederkehr_s 2GB-3GB 6397
wiederkehr_s >3GB 180
yangyong 0GB-1GB 884
yangyong 1GB-2GB 12
yangyong 2GB-3GB 10
yangyong >3GB 11

mysql command in t3ce02 was More... Close
mysql> select username, RAM_RANGE, count(*) as JOBs from ( SELECT username, CASE WHEN maxvmem > 0 and maxvmem <= 1000000000 THEN '0GB-1GB' WHEN maxvmem > 1000000000 and maxvmem <= 2000000000 THEN '1GB-2GB' WHEN maxvmem > 2000000000 and maxvmem <= 3000000000 THEN '2GB-3GB' ELSE '>3GB' END as RAM_RANGE from view_accounting where exit_status=0 and submission_time > (current_timestamp - interval 6 month) ) as job_summaries GROUP BY username,RAM_RANGE ; 

/pnfs

HW contributing to the total /pnfs capacity and their warranties

  • Usage ( /pnfs ) = ~340TB
  • Capacity ( /pnfs ) = ~200TB + ~270TB + ~270TB ; the ~340TB are hosted on ~270TB + ~270TB and partially replicated on ~200TB to speedup the reads
  • The unreliable ~200TB are still available as a read-only cache just because of a lucky coincidence, we could recycle both >100 1TB disks and several file servers from CSCS
  • 2 HP files servers are connected to both ~270TB and ~270TB and they're running out of warranty on 2018-06-30
  • ~270TB are running out of warranty on 2017-01-31 ; warranty can be prolonged till 2018-06-30 for ~15k CHF ( price to be confirmed ) ; Fabio highly recommends this purchase .
  • ~270TB are running out of warranty on 2018-07-30
  • ~270TB + ~270TB Bandwidth as it's cabled now : 10Gbit/s + 10Gbit/s ~= 2.5GB/s
  • ~270TB + ~270TB Bandwidth if re-cabled to its best : 2 * 10Gbit/s + 2 * 10Gbit/s ~= 5GB/s ; we still didn't make this re-cabling because of the Bandwidth provided by ~200TB, that's ~2GB/s

Usage ( /pnfs ) = ~340TB

~340TB by access time in %

More... Close

period %
2010 0.08
2011 1.17
2012 2.37
2013 22.04
2014 40.40
2015 33.86

~340TB by access time and T3 groups - stacked histogram

More... Close 340TB_pnfs_by_atime_and_T3_groups.png

~340TB by access time and T3 groups - table

More... Close

period PhEDEx ethz_bphys ethz_ecal ethz_ewk ethz_higgs ethz_susy psi_bphys psi_pixel uniz_bphys uniz_higgs uniz_pixel without_a_T3_group
2010-04 1                      
2010-05 80                      
2010-06             46          
2010-07                        
2010-08   9         25          
2010-09                        
2010-10             5         21
2010-11                       25
2010-12                       57
2011-01             5         20
2011-02 21                     162
2011-03                       19
2011-04                       6
2011-05         64             38
2011-06 39 74         13          
2011-07   177         56          
2011-08                        
2011-09 406       2              
2011-10         10   1638          
2011-11 104           1065          
2011-12             84          
2012-01           13 18 2        
2012-02   43       3 176          
2012-03           47 1873 118        
2012-04           8 1086          
2012-05           53 5          
2012-06 376   114   26 17            
2012-07     197     5 9          
2012-08     156     18            
2012-09 15   6 1   59 700          
2012-10     44 146   59 79          
2012-11 1551   269 45 72 142 52          
2012-12     50     412 32          
2013-01     26 1 9 426 92          
2013-02 1333   34   178 3143 8953          
2013-03 938       225 3327 340          
2013-04 4       13 1136 100          
2013-05       50 32 2281 3951          
2013-06     8 576   2026 868     3    
2013-07     436 352   2165 201          
2013-08     493 17 16 429 31          
2013-09 149   132 886 12 2935 2366 2       36
2013-10 1198 62 423 477 109 5243 3724 92     3 165
2013-11 2679 350 1127 308 1452 8424 6895 6       172
2013-12 3   1588 3   36 5          
2014-01 19   854 56 13 2837       5    
2014-02 5   4853 62 304 172       19 6890  
2014-03     2289 162 616 11314       3383 1606  
2014-04 4   230 2 427 4111 33     6323 8754  
2014-05     166   480 738   1   2058 981  
2014-06 8   1308 10 205 3043 9 3   7514 4162  
2014-07 30   2535 245 417 6493       1241 17  
2014-08 360   16 171 65 226 62     14896 1547  
2014-09 109   195 660 75 281       135 202  
2014-10 597   5316 294 48 886       1694 183  
2014-11 109   2358 4 109 237 17     16564    
2014-12 23   1379 264 199 1254       494    
2015-01     1017 87 136 1088       327 1744  
2015-02 24   2762   1 1079   7   8294 173  
2015-03 36   1540   88 2648 8263     5720    
2015-04 808         1430 36     1481    
2015-05         101 163       356 2  
2015-06         2065 3135 13 2   598    
2015-07 208   162   696 6163 30 2   594    
2015-08     29   1306 6294 123 20   765 47  
2015-09 643   35   343 4940 628 21   24109 2  
2015-10 5747   246   2923 6440 5901     1855 172  
TOTs in GB 17627 715 32393 4879 12837 97379 49608 276 0 98428 26485 721

~340TB by creation time and T3 groups - stacked histogram

More... Close 340TB_pnfs_by_crtime_and_T3_groups.png

~340TB by creation time and T3 groups - table

More... Close
period PhEDEx ethz_bphys ethz_ecal ethz_ewk ethz_higgs ethz_susy psi_bphys psi_pixel uniz_bphys uniz_higgs uniz_pixel without_a_T3_group
2010-04 1                      
2010-05 101                      
2010-06             52          
2010-07                        
2010-08   240         29          
2010-09                        
2010-10             9         21
2010-11                       38
2010-12                       63
2011-01             11         54
2011-02 27                     290
2011-03                       31
2011-04                       14
2011-05         67             50
2011-06 62 78         16          
2011-07   319     5   103          
2011-08                        
2011-09 843       5              
2011-10         19   2746          
2011-11 169           1843          
2011-12             168          
2012-01           18 44 2        
2012-02   79       4 316          
2012-03           76 2869 151        
2012-04           11 1512          
2012-05           80 6          
2012-06 1448   123   28 23            
2012-07     339     7 10          
2012-08     293     23            
2012-09 19   13 2   67 977          
2012-10     63 250   106 155          
2012-11 2051   379 163 101 165 67          
2012-12 8   68   1 628 63 68        
2013-01 571   6595 5 21 618 123          
2013-02 2020   104   2211 16798 12842          
2013-03 1363       389 13154 524          
2013-04 4       13 2857 139          
2013-05       139 47 5096 8167          
2013-06     15 1021   4078 1268     3    
2013-07     644 537 1 4021 341          
2013-08     963 98 18 1343 191       6051  
2013-09 2     2020 6 551         4958 90
2013-10 329   25 110   1039         420 71
2013-11 1075   3514 97 201 3039         2338  
2013-12     3316 2 2 53         2511  
2014-01     102   30 1698       9305    
2014-02     593 62 13 34       95 1325  
2014-03     71 13 897 117       8481 2328  
2014-04     22   330 750   1   13269 2047  
2014-05 3   202   24 4988   2   8575 1655  
2014-06     190 62 62 510   1   5961 717  
2014-07     4454   325 122       7563 8  
2014-08     15   193 178       4548 146  
2014-09     2 3 72 212       278    
2014-10     3353   52 672       1368    
2014-11     1158 4 27 1306 325     643    
2014-12 4   365 293 107 1017       54    
2015-01     1920   135 148   4   783 1752  
2015-02 50   1486   1 816   4   4737 7  
2015-03     1535     2391 7955     12107    
2015-04 810         1675 36     5563    
2015-05         101 163       2750 2  
2015-06         2071 3138 170 2   5812    
2015-07 3676   162   879 6175 78 2   818    
2015-08 272   36   1567 6517 129 20   4376 210  
2015-09 772   28   221 6806 730 21   423 12  
2015-10 1861   246   2587 5729 5419     871    
TOTs in GB 17541 716 32394 4881 12829 99017 49433 278 0 98383 26487 722

/shome /swshare shared filesystems

  • Both the critical /shome /swshare shared filesystems are running on respectively 2 out of warranty file servers with mutual backups.
  • In order to migrate these shared filesystems onto new HW in Oct 2015 we've bought for ~40k CHF :
Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng 340TB_pnfs_by_atime_and_T3_groups.png r1 manage 67.9 K 2015-10-27 - 09:46 FabioMartinelli 340TB_pnfs_by_atime_and_T3_groups
PNGpng 340TB_pnfs_by_crtime_and_T3_groups.png r1 manage 61.8 K 2015-10-27 - 12:32 FabioMartinelli 340TB_pnfs_by_crtime_and_T3_groups
PDFpdf Tier3BoardMeeting_UZH_position.pdf r1 manage 74.6 K 2015-10-27 - 14:51 FabioMartinelli UZH_slide_6th_Steering_Board_Meeting
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r15 - 2016-01-12 - FabioMartinelli
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback