Tags:
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Phoenix Cluster Road Map This page displays the situation where our Cluster is, where is it going to, and where does it come from. It can be specially useful to have an overview of the changes made so far, and the plans for the future. %TOC% ---++ Administrativa * EGEE [[https://edms.cern.ch/file/860386/0.5/EGEE-ROC-Site-SLD-v1.1.pdf][Service Level Description]] ---++ PhaseH (planned) Scheduled to be deployed before March 2014, will consist mainly of: * 5400 HS06 computing power, which corresponds to 16 Intel SandyBridge compute nodes as of beginning of 2013. * 280 TB of central storage, or one building block with DCS3700 as of beginning of 2013. * 12 GB/s scratch filesystem, to replace PhaseC GPFS. A couple of framework agreements will be placed for this purchase. ---++ PhaseG Announced the 5th of March of 2013, and staggered deployed since the end of 2012, its main purpose was to meet the 2013 pledges. * PhaseC Thors had to be decommissioned, due to increased issues with them, and were replaced first by CSCS-lent hardware, and then by six IBM DCS3700 controllers (three couplets) with 60 x 3 TB disks each. Two of those controllers provide Storage space (279 TB) that is not pledged, in case it has to be moved in an emergency to Scratch. * Various sets of compute nodes, purchased directly from other projects (Weisshorn and Pilatus), were gradually introduced, summing up 20 nodes (+6660 HS06), similar to those already introduced after the move to Lugano. * Full replacement of the Ethernet infrastructure in the cluster | *Resource* | *Provided* | *Pledged 2013* | *Notes* | | Computing (HS06) | 24198 | 23000 | 40% ATLAS, 40% CMS, 20% LHCb | | Storage (TB) | 1303 (+279) | 1300 | 50% ATLAS, 50% CMS | * Reinstallation of WNs to UMD-1 * CernVMFS set for all three main VOs. * Installation of two new virtual machine hosts, with SSD drives, for good DB performance. * Upgrade of dCache from version 1.9.5 to 1.9.12 * New protocol supported: xRootd door installed in dCache * Warranty extended for Storage components to 4 years. Rest is left 3 years. * Phoenix training procedures course to other CHIPP sysadmins * ActivitiesOverview2013 ---++ PhaseF Deployed the 21st of August of 2012, as a compute expansion to meet 2012 pledges. * An important milestone is also the move to a new datacenter in Lugano. BlogLuganoBuildingMove * The old PhaseC compute nodes were decommissioned after the move (not relocated), and replaced by 36 SandyBridge Intel nodes (2 x 8 HT cores, 2.6 GHz, 64 GB RAM, 333 HS06 each) * Later, a small increase in compute, consisting of 10 more similar nodes, was purchased to meet the pledges in July. * The infiniband network was replaced by 5 Voltaire 4036 Switches. | *Resource* | *Provided* | *Pledged 2012* | *Notes* | | Computing (HS06) | 17538 | 17400 | 40% ATLAS, 40% CMS, 20% LHCb | | Storage (TB) | 1095 | 1090 | 50% ATLAS, 50% CMS | * Simplification of the network infrastructure, moving to mixed Ethernet-Infiniband bridged network to pure Infiniband, with a transparent bridge. * The external network connection was also upgraded to 10 Gbit/s * three new services were added: CernVMFS, Argus03 and Cream03. * Cluster declared as most efficient Atlas site in hammercloud tests. ---++ PhaseE Deployed the 20th of December of 2011, consisted of: * Decommissioning of the old Thumpers from PhaseA&B (475 TB) * Addition of IBM ds3500 disk space (+405 TB, +95 TB that came from the previous testing GPFS) * Upgrade to Interlagos (16 cores/CPU) of the 10 AMD nodes from PhaseD (+250 HS06) * Complete removal of Lustre as a Scratch filesystem, in favor of GPFS, using the PhaseC Lustre hardware, with two SSD PCI cards for metadata. | *Resource* | *Provided* | *Pledged 2011* | *Notes* | | Computing (HS06) | 13740 | 13550 | 40% ATLAS, 40% CMS, 20% LHCb | | Storage (TB) | 1095 | 976 | 50% ATLAS, 50% CMS | * NewEMIRequirements. Work plans within new EMI infrastructure, things we have to adapt to. * Some hardware was given to other Swiss universities: BlogPhaseBCHardwareGiveaway * Previous Scratch filesystem was substituted by GPFS, with Metadata on Solid State Drives greatly improving reliability and speed on the file metadata operations. * Began migrating some virtual services to KVM, with an integrated management tool (convirture), and the possibility for live migration. ---++ PhaseD In production from the 9th of March of 2011, as a plain extension to PhaseC. Consisted of * 10 new compute nodes with AMD 6172 12-core processors and 3 GB RAM, * Three IBM DS3500 controllers: two dedicated to dCache (90 x 2 TB disks each) and one to Scratch. This new scratch expansion was used as a test production instance of GPFS, to evaluate the alternative as opposed to Lustre, that was giving significant trouble. | *Resource* | *Provided* | *Pledged 2011* | *Notes* | | Computing (HS06) | 13488 | 13550 | 40% ATLAS, 40% CMS, 20% LHCb | | Storage (TB) | 976 | 976 | 50% ATLAS, 50% CMS | * Tier2ResourcePlanning. Requirement definition for 2010-2013 * VoTier2Requirements. General resource planning for the whole LHC * PhaseDRequirementsDiscussion. First PhaseD requirements discussion. May 2010. * ActivitiesOverview2010to2011. Explaining the main activities done in Phoenix from September2010 to August 2011, to attach to SNF funding document * PhoenixBlogReorganizingTwiki ---++ PhaseC Delivered in November 2009, put into production the 31st of March of 2010. PhaseC was a total rebuild of the computing/service nodes, and an expansion to the existing storage with 10 new dCache pools (Thors) Consisted in 96 compute nodes (SunBlades E5540 installed with SL5), and 38 disk nodes (28 x Thumpers X4500 with Solaris, + 10 Thors X4540 with Linux - 40 x 1 TB disks). Initially the nodes were provided without HyperThreading, with job slots 768 slots (8 cores/CPU, 3 GB RAM/core), but it did not provide enough computing power, so HyperThreading was enabled (even though only 12/16 cores were used), that enhanced the performance and increased the number of slots to 1152, but reduced the amount of memory to 2 GB/slot. | *Resource* | *Provided* | *Pledged 2010* | *Notes* | | Computing (HS06) | 11520 | 10560 | 40% ATLAS, 40% CMS, 20% LHCb | | Storage (TB) | 910 | 910 | 50% ATLAS, 50% CMS | * Tier2ResourcePlanning. Requirement definition for 2006-2010 * ChippCscs20090605. CHIPP/CSCS meeting for Phase C upgrade planning, planned in May 2009. * ChippCscsSun20090716. CHIPP/CSCS/SUN meeting for discussing Phase C offer, and evolving discussion. July 2009. * PhaseCNetworkDcacheSetup. Schema and discussion for network / dCache setup in Phase C. July 2009. * PhaseCDiscussion. Discussions about PhaseC. IRC discussion in July 2009. * PhaseCSmoothUpgrade. Alternatives on how to proceed with the upgrade to PhaseC. August 2009. * ResourceAccounting2010. Some usage graphics for 2010. * ChippMeeting20100701. Meeting about the status and activities of PhaseC by July 2010. ---++ PhaseB Delivered by Sun in September 2008. It Was a planned expansion over PhaseA, and consisted in a total of 60 compute nodes (SunBlades X2200 installed with SLC4), and 29 disk nodes (Thumpers X4500 with Solaris, 48 x 500 GB disks each). | *Resource* | *Provided* | *Pledged 2009* | *Notes* | | Computing (HS06) | ~6000 (1574.4 SpecInt2000) | 5760 | | | Storage (TB) | 517 | 490 | | * Tier2ResourcePlanning. Requirement definition for 2006-2010 * PhoenixImages. A few Fotos from PhaseA and PhaseB * PhoenixPresentations. Some presentation from 2006 to 2009 * PhaseBProgress. PhaseB installation progress by October 2008. * PhaseB. PHOENIX Phase B planning with detailed node setup. * MeetingNov2009. Meeting about the progress of the Cluster in November 2009. ---++ PhaseA First phase of Phoenix. Delivered from Sun the 20th of November of 2007 (Tender released in November 2006: The plan was to build three phases, to be delivered by the end of 2007, 2008 and 2009). It was composed of 30 compute nodes (SunBlades X2200 installed with SLC4), and 12 disk nodes (Thumpers X4500 with Solaris, 48 x 500 GB disks each). The second MoU was signed the 27th of March, 2007. | *Resource* | *Provided* | *Pledged 2008* | *Notes* | | Computing (HS06) | ~3100 (787 kSI2000) | ~2700 (680 kSI2000) | | | Storage (TB) | 280 | 225 | | * PlanningPhaseA. Planning of the Phase A Cluster by November 2007 * [[SUNCluster]]. SUN Cluster Delivery and Configuration by the end of 2006 and beginning of 2007. * PhoenixImages. A few Fotos from PhaseA and PhaseB * PhoenixPresentations. Some presentation from 2006 to 2009 * PhoenixCluster. Phoenix Cluster configuration from 2007 * PublicInformation. Some press articles in 2008 ---++ Phase0 This was the test cluster deployed to evaluate alternatives. | *Resource* | *Provided* | *Pledged 2007* | *Notes* | | Computing (HS06) | | ~800 (200 kSI2000) | | | Storage (TB) | | 61 | | ---++ Dalco Cluster The first attempt for a Swiss Tier2 was written for 2004. The first machines from Dalco were deployed in 2005, consisting of 15 compute nodes (2 x Intel Xeon E7520 @ 3 GHz), 1 storage node (4 x 4.8 TB) and 1 master node. The first MoU was written in May 2005. | *Resource* | *Provided* | *Pledged 2006* | *Notes* | | Computing (HS06) | ~180 (45 kSI2000) | ~180 (45 kSI2000) | | | Storage (TB) | 9 | 9 | | -- Main.PabloFernandez - 2011-01-20
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r23
|
r19
<
r18
<
r17
<
r16
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r17 - 2013-06-10
-
PabloFernandez
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback