Cluster Overview

Cluster Composition and Services

Type Hosts Hardware Services
CmsVoBox t3cmsvobox ( t3cmsvobox01 ) PSI DMZ VMWare cluster PhEDEx 4.2.1
MeGt3fs16 t3fs16 HP DL380 Gen9 file server
NFSServerZFSBackupANDdCache t3nfs02 HP DL380 G9 NFSv4 service based on ZoL + dCache Pool
CmsFrontier t3frontier01, DNS alias t3frontier = t3frontier01 PSI DMZ VMWare cluster CMS-Frontier and CVMFS Squid cache
ComputingElement t3ce02 PSI DMZ VMWare cluster Sun Grid Engine 6.2u5
JumpStart t3jumpstart01 PSI DMZ VMWare cluster jet, ssh
Mon t3mon01, DNS alias t3mon = t3mon01 PSI DMZ VMWare cluster ganglia collector, ganglia web front end
OLDNFSServer t3fs05 - OUTDATED ! SUN X4500 (2*Opt 290, 16GB RAM, 48*500GB SATA) central SW NFS service, backup on t3fs06
OLDNFShomeServer t3fs06 - OUTDATED ! SUN X4500 (2*Opt 290, 16GB RAM, 48*500GB SATA) NFS (user home area), backup on t3fs05
Ossec t3ossec PSI DMZ VMWare cluster ossec daemon
SyslogNg t3service01 PSI VM DMZ cluster Syslog-ng 2.1.4-9 Central Logging Service
UIs t3ui0[1,2,3] Dalco ssh, freeNX
WNsSunBlade t3wn[10-29] SUN Blade 6270 (2*Xeon 5560, 24GB RAM, 2*146 GB SAS) / SUN X4150 (2*Xeon E5440, 16GB RAM, 2*146 GB SAS) Sun Grid Engine 6.2u5 execution hosts
dCacheSiteBDII t3bdii0[1,2], DNS alias t3bdii = t3bdii02 PSI DMZ VMWare cluster Site BDII
dCacheSolaris [t3fs02 - t3fs04, t3fs07 - t3fs11] READ-ONLY !! SUN X4500 (2*Opt 290, 16GB RAM, 48*500GB SATA) / SUN X4540 (2*Opt 2435, 32GB RAM, 48*1TB SATA + 16GB Flash) dcache pool cells, gridftp, dcap, gsidcap
dCachet3fs13t3fs14 t3fs[13,14] READ-WRITE !! HP Proliant DL380 G7 dcache pool cells, gridftp, dcap, gsidcap
WNsIntelMeG T3WN[60..63] DALCO r2264i6t 4 nodes in 2U module chassis with 2x20 Intel(R) Xeon(R) E5-2698 v4 @ 2.20GHz; RAM - 256 GB SGE
WNsIntelS2600TP t3wn[51-59] Dalco r2264i5t - Intel S2600TP Sun Grid Engine 6.2u5 execution hosts
WNsSuperMicro t3wn[41-50] SuperMicro 1u got from CSCS Sun Grid Engine 6.2u5 execution hosts

Cluster Specs

Original Requirements

Our requirements document (q.v. our internal CMS-Tier3-Project-description.doc) specified the following for the two install phases.

Phase Year CPU / kCINT2000 Disk / TB
A 2008 180 75
B 2009 500 250
C 2012 ?? ??

Phase A - CPU

CINT = SPECint benchmark value (average baseline value taken from the SPEC published results). For 1000 CINT2000 one frequently uses the abbreviation kSI2k.

No. of WNs Processors Cores/node CINT2006/core kCINT2000/core No. of Cores CINT2006 kCINT2000
8 2*Xeon E5410 8 18.8 3.34 64 1203.2 213.76

Note: We use this conversion for the untabled CINT2000 values: CINT2000base=-1373.9673+250.8226*CINT2006base

Phase A - Storage

No. of Fileservers Type Space/Node (TB) Total Space (TB)
6 SUN X4500 17.5 105

Note: 48 500 GB disks per fileserver:

  • 4 raidz pools with 9 disks, 1 raidz pool with 8 disks: (4*8 + 1*7) * 500 GB = 21 TB usable raw: measured with fs overhead: ca 17.5 (83%)
  • 2 disks mirrored OS
  • 2 disks as hot spare

Phase B - CPU

For 1000 CINT2000 one frequently uses the abbreviation kSI2k.

T3_CH_PSI CPU resources and performance benchmarks 2010/2011
No. of WNs Processors Cores/node HS06/node HS06/coreSorted ascending kCINT2000/core total No. of Cores total HS06 total kCINT2000
7 2*Xeon E5410 8 91.97 11.5 2.95 56 644 165
20 2*Xeon X5560 8 117.53 14.69 3.77 160 2350 603
27           216 2994 768

Note: hepspec06/kSi2k = 3.9 ± 0.2 => kSi2k = hepspec06/3.9

The PSI Tier-3 has 48 active users (August 2011).

Phase B - Storage

T3_CH_PSI SE Storage recources 2010/2011
No. of Fileservers Type Space/Node (TB) Total Space (TB)
4 SUN X4500 16.10 64
5 SUN X4540 34.09 170
9     234

Note: 48 1TB disks per X4540 fileserver:

  • 5 raidz pools with 9 disks: 5*9 * 1TB = 45 TB usable raw. df yields 34093714636800 bytes = 34.09 TB = 31.008 TiB
    • Current dcache config: 2 pools of 14000 GiB = 15.03 TB
  • 3 disks as hot spare
  • OS sits on flash storage
  • Note: Two of our 6 X4500 are now used for home directories, SW areas and backup and therefore were taken out of this table.
  • For old X4500: 16097967341568 bytes = 16.10 TB = 14.64 TiB
    • Current dcache config: 1 pool of 14000 GiB = 15.03 TB

Phase C - CPU

T3_CH_PSI CPU resources and performance benchmarks 2012
No. of WNs Processors Cores/node HS06/node HS06/core kCINT2000/core total No. of Cores total HS06 total kCINT2000
20 2*Xeon X5560 8 117.53 14.69 3.77 160 2350 603
11 2*E5-2670 2.60GHz 16 263 16.44 4.22 176 2893 743
31           336 5243 1346

Note: hepspec06/kSi2k = 3.9 ± 0.2 => kSi2k = hepspec06/3.9

The PSI Tier-3 has 53 active users (June 2012).

Phase C - Storage

T3_CH_PSI SE Storage recources 2012
No. of Fileservers Type Space/Node (TB) Total Space (TB)
4 SUN X4500 16.10 64
5 SUN X4540 34.09 170
2 HP Proliant DL380 G7 130 260
11     494

Note 1: 48 1TB disks per X4540 fileserver:

  • 5 raidz pools with 9 disks: 5*9 * 1TB = 45 TB usable raw. df yields 34093714636800 bytes = 34.09 TB = 31.008 TiB
    • Current dcache config: 1 unique pool of ~31 TiB
  • 3 disks as hot spare
  • OS sits on flash storage
  • Note: Two of our 6 X4500 are now used for home directories, SW areas and backup and therefore were taken out of this table.
  • For old X4500: 16097967341568 bytes = 16.10 TB = 14.64 TiB
    • Current dcache config: 1 pool of 14000 GiB = 15.03 TB
Note 2: 120 3TB disks hosted in SGI IS5500 per both HP Proliant DL380 G7 fileserver:
  • SGI IS5500 formatted with 12 Raid6, each 8+2 disks, 6 Raid6 offered to first HP Proliant DL380 G7, 6 to the other.
  • So 6 dCache cms pools each 22TB per HP Proliant DL380 G7 fileserver => 130TB per fileserver => 260TB both.
  • 2 hot spares , even if we have the Raid6 protection.
  • Generally speaking, the SGI IS5500 can be expanded by simply adding disk trays.
  • Just an idea, the cheapest way to expand this storage could be add an other expansion to SGI IS5500 with 12*5*3TB disks => organized like 6 Raid6 => we gain 130TB net => we attach these new 6 volumes to t3fs13 and we make a bonding of its 2*10Gbit/s Ethernets, we connect t3fs13 to the last 10Gbit/s uplink available in our 3 network switches.

Phase D - CPU

On 10th Oct 2013 the CPU resources are the same of 2012 Phase C - CPU.

Phase D - Storage

T3_CH_PSI SE Storage Resources 2013
No. of Fileservers Type Space/Node (TB) Total Space (TB) Total Space (TiB)
4 SUN X4500 16 64 58
5 SUN X4540 33 165 150
2 HP Proliant DL380 G7 282 564 513
11     793 721

Note 1: During the Summer 2013 we connected a new NetApp E5400 360TB raw to the 2012 HP Proliant DL380 G7, so the final net storage has been easily doubled.

Phase E - CPU

On 18th Aug 2014 the CPU resources are:

T3_CH_PSI CPU resources and performance benchmarks 2014
No. of WNs Processors Cores/node HS06/node HS06/core kCINT2000/core total No. of Cores total HS06 total kCINT2000
20 2*Xeon X5560 8 117.53 14.69 3.77 160 2350 603
11 2*E5-2670 2.60GHz 16 263 16.44 4.22 176 2893 743
4 2*AMD 6272 2.40GHz 32 241 7.53 1.93 128 964 247
35           464 6207 1593

No. of UIs Processors Cores/node HS06/node HS06/core kCINT2000/core total No. of Cores total HS06 total kCINT2000
6 2*AMD 6272 2.40GHz 32 241 7.53 1.93 192 1446 371
6           192 1446 371

Phase E - Storage

Same storage as Tier3Overview#Phase_D_Storage

Phase F - CPU NEW

In April 2016 the CPU resources are :

T3_CH_PSI CPU resources and performance benchmarks 2014
No. of WNs Processors Cores/node HS06/node HS06/core kCINT2000/core total No. of Cores total HS06 total kCINT2000
9 2*Xeon E5-2698v3 64 700 10.94 2.81 576 6301 1619
20 2*Xeon X5560 8 117.53 14.69 3.77 160 2350 603
11 2*E5-2670 2.60GHz 16 263 16.44 4.22 176 2893 743
4 2*AMD 6272 2.40GHz 32 241 7.53 1.93 128 964 247
44           1040 12508 3212

No. of UIs Processors Cores/node HS06/node HS06/core kCINT2000/core total No. of Cores total HS06 total kCINT2000
6 2*AMD 6272 2.40GHz 32 241 7.53 1.93 192 1446 371
6           192 1446 371

Phase F - Storage NEW

Same as Phase E

Rack layout

T3 HW Pictures

Phase A:

Phase B:

  • a lot of crates and packing material
  • Tier-3 phase B system:

2 * HP Proliant DL380 G7

1 * SGI IS5500

  • Front: Show Hide
  • Back ( but don't consider the Infiniband expansion ): Show Hide
  • SGI IS5500 360TB front picture + 2 HP Proliant:

11 * Intel S2600JF

  • 11 Intel S2600JF installed by mdadm raid1 (OS) and mdadm raid0 /scratch :

10 * Supermicro1uH8DGU-F NEW

  • 6 SL6 UIs + 4 SL6 WNs installed by mdadm raid1+0 - 32 cores 100GB RAM 1700 GB /scratch:
Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg 16122009529-small.jpg r1 manage 142.1 K 2009-12-23 - 09:53 DerekFeichtinger phase B crates
JPEGjpg 20100514_005.jpg r1 manage 736.6 K 2010-05-14 - 13:20 DerekFeichtinger Tier-3 phase B
JPEGjpg 22072008074.jpg r1 manage 515.6 K 2008-08-23 - 09:55 DerekFeichtinger cluster photo
Unknown file formatodp CHIPP-Meeting-20080909.odp r1 manage 965.2 K 2008-09-01 - 16:18 DerekFeichtinger  
JPEGJPG IMG_0484.JPG r1 manage 94.5 K 2012-01-11 - 10:23 FabioMartinelli 1 Hot Swap Power Supply
JPEGJPG IMG_0486.JPG r1 manage 129.0 K 2012-01-11 - 10:30 FabioMartinelli Back Side with 4 1Gbit/s E, 4 8Gbit/s FC, 2 10Gbit/s E
JPEGJPG IMG_0487.JPG r1 manage 116.5 K 2012-01-11 - 10:33 FabioMartinelli Internal
JPEGJPG IMG_0488.JPG r1 manage 127.6 K 2012-01-11 - 10:34 FabioMartinelli RAM Left Side
JPEGJPG IMG_0489.JPG r1 manage 149.8 K 2012-01-11 - 10:34 FabioMartinelli RAM Right Side
JPEGJPG IMG_0490.JPG r1 manage 77.7 K 2012-01-11 - 10:17 FabioMartinelli 1 Hot Swap Fan
JPEGJPG IMG_0491.JPG r1 manage 93.1 K 2012-01-11 - 10:20 FabioMartinelli Back Plane
JPEGJPG IMG_0492.JPG r1 manage 83.6 K 2012-01-11 - 10:20 FabioMartinelli Chassis Lock
JPEGJPG IMG_0493.JPG r1 manage 125.3 K 2012-01-11 - 10:41 FabioMartinelli Front Side Left
JPEGJPG IMG_0494.JPG r1 manage 131.1 K 2012-01-11 - 10:40 FabioMartinelli Front Side Right:
JPEGJPG IMG_0495.JPG r1 manage 86.4 K 2012-01-11 - 10:43 FabioMartinelli 1 Hot Swap SAS disk Up Side
JPEGJPG IMG_0496.JPG r1 manage 70.9 K 2012-01-11 - 10:45 FabioMartinelli 1 Hot Swap SAS disk Back Side:
PNGpng IMG_0600_2.png r2 r1 manage 5765.2 K 2012-05-30 - 08:04 FabioMartinelli 11 Intel S2600JF + mdadm raid 0/1
JPEGJPG IMG_0605.JPG r1 manage 2599.2 K 2012-05-24 - 13:54 FabioMartinelli SGI IS5500 360TB front picture + 2 HP Proliant
JPEGjpg IS5500-back.jpg r1 manage 2531.9 K 2012-02-14 - 09:32 FabioMartinelli SGI IS5500 360TB back picture
JPEGjpg IS5500-front.jpg r1 manage 2955.6 K 2012-02-14 - 09:32 FabioMartinelli SGI IS5500 360TB front picture
JPEGJPG New10UIsSummer2014.JPG r1 manage 1325.9 K 2014-07-27 - 08:27 FabioMartinelli 6 UIs + 4 WNs 32cores 100GB RAM 1700 GB scratch
PNGpng T3-Racklayout.png r3 r2 r1 manage 37.1 K 2010-07-09 - 12:10 DerekFeichtinger Tier3 Rack layout
JPEGJPG photo_1.JPG r1 manage 2690.8 K 2015-02-23 - 14:28 FabioMartinelli 6 UIs + 4 WNs 32cores 100GB RAM 1700 GB scratch
Edit | Attach | Watch | Print version | History: r50 | r48 < r47 < r46 < r45 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r46 - 2016-09-01 - FabioMartinelli
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback