<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people * #Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Phenix updated installation and configuration status * Minutes of the phone call held the 13th February 2007 * Participants: * Sergio Maffioletti * Alessandro Usai * Tom Guptil * Derek Feichtinger * Sigve Haug * Zhiling Chen ---+++ Status of installation and configuration * *WN (X2200)* * SLC308 [ok] * LCG/GLite [ok] * WN integrated into old LRMS [ok] * we could have all WNs integrated in short time (sharing /apps via NFS) * *CE (X4200)* * SLC4 [ok] * for the time being we agreed on having the old ce01-lcg used and Torque as LRMS * SGE integration will be tested and plan to have it in production as soon as it will be stable * Nordugrid will have to be checked and tested too * *Problem encoutered, solutions and workarounds* * SLC306 does not work (missing controller drivers) * SLC4 does work but installing LCG/Glite sw is error prone * Thumper installation [ok] but we need to test ZFS functionalities * Tom proposed to change the current RAID configuration to use only 1 parity disk; this would give additional 4TB at the expense of reliability [still to be decided] * SUN N1 is not suitable for cluster management therefore we will use cfengine * planning to have all cluster management services on 1 X2200 on Linux (possibly with Solaris on a Virtual Machine) * 2 X4200 --> should become free * *Tests* (tentative dates) * Reliability tests on Thumpers (Tom + Alessandro) --> 12 - 16 February * Performance test from WNs to Thumpers via dcache --> 14 - 21 February * Test different configurations of ZSF and dcache --> 12 - 23 February * *Organisation of the dCache tests* * functionality tests * VO codes * local load tests (mainly dcap): * writing files in parallel from multiple nodes * reading same file from multiple nodes * trying to write file that is being written by another process * erasing file that is being read by another process * measure I/O rates as function of parallel clients * WAN protocol tests (SRM, gridftp) * CMS PhEDEx transfers * Storage access profile of CMS jobs -> they will use dcap protocol * Storage access profile of Atlas jobs -> for those using ARC, maily the access is through SRM and/or Gridftp * each VO should prepare their own specific tests * General test suite ( local and wan test ) will be prepared by Derek * Sigve will forward the test description to the Atlas contact to check if Atlas will need additional tests ---+++ SE dcache configuration scenario * PNFS + posgres DB on a fat node = 1 X4200 * SRM + Dcache domains + LCG/GLite sw on standard WN = 1 X2200 * gridftp + few dcache modules still to be checked = 2 X4500 * we still need to understand the proper scenario * We all agree on such configuration * things tobe checked: * is it necessary to mount PNFS on Thumper ? apparently yes if Thumper is running Gridftp ( thanks to Lionel Schwarz ) * what is necessary for WN to use the dcap protocol to access the dcache pools ? apparently the client-only dcache package ( Alessandro will check ) ---+++ Planning migration of DPM data to dcache pool * Migrate DPM data to the new SE (se02-lcg.projects.cscs.ch) * Users will have to migrate their data and update catalog * With the introduction of the new SE as the default CSCS/CHIPP SE, we will have to change few settings in FTS and IS (Alessandro will check) * Derek and Sigve will check what is necessary to be done for CMS and Atlas VOs for the support of the new SE * Current se01-lcg will be kept as backup for initial time and then re-converted as dcache pool ---+++ VO Disk space shares: * Agreed to have Filesystem <-> VO mapping as has been done in Phenix I * Accepted proposal: * Each VO will have access to both Thumpers * Atals = 2 x 6TB * CMS = 2 x 6TB * LhCb = 1 x 1/2 TB * Hone = 1 x 1/2 TB * dteam = 1 x 1/2 TB * spare = 2 x 3.5TB * spare disk will be available to all VOs at request * we may also take space from dteam * lhcb should agree on having initially only 1/2TB ( Derek will contact ) ---+++ What bandwidth can we expect: * WN = 1GB link * Thumper = 4 x 1GB links trunked * from CSCS to Karlsruhe -> 20MB/s should be guaranteed ( CSCS will check ) ---+++ VO CPU shares based on queue priority including nordugrid- queues. * For the time being we will keep the configuration of the queues as they are * We will observe the behavior of the queues * When we will migrate to the new SGE-based CE, we will address the fare-share issue ---+++ Integration with Phenix 1 cluster * integration of WN * Agreed to integrate 10WNs when installation will be stable ---+++ Deadlines * We can make it for the end of February * Next week we will have more info ---+++ AOB * update UI machine (Strange java exception errors)? * UI will be re-installed in the next two weeks * Proposal to migrate to a server box ( gain reliability ) * VOBoxes * should we reinstall as true LCG VO-Boxes? This would provide gsissh and easier myproxy management * We are planning to migrate these boxes anyway * responsibility for Twiki areas (CSCS will take care) * Create 1 page per VO * add a page with logs of problems * VOBoxes page with info about how to start services --- ---++ Resume of the Configuration * 1 X2200 = cluster management system * 1 X2200 = SRM + dcache domains + LCG SE related software * 1 X4200 = PNFS + posgres * 1 X4500 Thumpers = Gridftpd + dcache pool node * 1 X4500 Thumpers = Gridftpd + dcache pool node * ZFS configuration (Proposal): * 1 Thumper test with 4 Raid and 2 parity disks = 16TB + 4 spare disks * 1 Thumper test with 4 Raid and 1 parity disk = 18TB + 4 spare disks * 1 ZFS pool per Thumper * 1 Filesystem per VO per Thumper * each VO gets space on both Thumpers * Thumper dcache configuration * each Thumper will have 1 dcache pool per VO/FS (CMS,Atlas,dteam) * 1 Thumper will also have FS for lhcb and hone --- -- Main.SergioMaffioletti - 13 Feb 2007
This topic: LCGTier2
>
WebHome
>
RoadMap
>
SUNCluster
>
PhenixMinutes130207
Topic revision: r1 - 2007-02-13 - SergioMaffioletti
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback