PhenixMinutes020307 < LCGTier2

<!-- keep this as a security measure:
   * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup
   * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup
   #uncomment this if you want the page only be viewable by the internal people
   * #Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup
-->

---+ Phenix updated installation and configuration status - 2nd March 2007

   * Minutes of the phone call held the 2nd March 2007
   * Participants:
      * Sergio Maffioletti
      * Alessandro Usai
      * Tom Guptil
      * Derek Feichtinger
      * Sigve Haug


---+++ Status of installation and configuration 
 
   * *WN (X2200)*
      * SLC4.4 [tobe finalized and integrated into cfengine] 

   * *CE (X4200)* 
      * 1 X4200 used for PNFS + postgres [ok]	

   * *WN (X2200)*
        * 1 X2200 used as dcache admin node (SRM + dcache packages) [ok]
 
   * *Thumper*
      * 1 Thumper integrated as pool node [ok]
      * Configuration of pools
         * 1 pool write only and 1 pool read only per VO

   * *Problem encoutered, solutions and workarounds* 
      * se02 is not registering properly with dcache admin nodes: pools are seen as non active
      * Derek will test dcache installation procedure from scratch on se02
      * MonAMI monitor may not be usable as it is querying tables removed from latest dcache release 
      * The final system will need some ad-hoc scripts/cron jobs for the update of CA list and grid-mapfiles
	
   * *Tests* (tentative dates) 
      * Reliability tests on whole system (Derek + Tom + Alessandro) --> 5 - 9 MArch
      * Performance test from WNs to Thumpers via dcache (Derek + Sigve + Alessandro) --> 2 - 9 March
      * Test different configurations of ZSF [ok]

   * *Organisation of the dCache tests* 
      * functionality tests 
      * VO codes 
      * local load tests (mainly dcap): 
      * writing files in parallel from multiple nodes 
      * reading same file from multiple nodes 
      * trying to write file that is being written by another process 
      * erasing file that is being read by another process 
      * measure I/O rates as function of parallel clients 
      * WAN protocol tests (SRM, gridftp) 
      * CMS PhEDEx transfers 
      * Storage access profile of CMS jobs -> they will use dcap protocol
      * Storage access profile of Atlas jobs -> for those using ARC, maily the access is through SRM and/or Gridftp
      * each VO should prepare their own specific tests
      * General test suite ( local and wan test ) will be prepared by Derek
      * Sigve will forward the test description to the Atlas contact to check if Atlas will need additional tests

---+++ SE dcache open issues 

      * is it necessary to mount PNFS on Thumper ? *yes* if Thumper is running Gridftp ( thanks to Lionel Schwarz )
      * what is necessary for WN to use the dcap protocol to access the dcache pools ? WNs will point todcache admin node and it will redirect to the right Thumper
 
---+++ What bandwidth can we expect: 

   * We may have preformance problem from Dalco WNs to Thumper due to single Gb link from switch to switch
   * Tom will check if both switches can be trunked together
   * This is an important issue to keep in mind when planning the extention
 
---+++ Deadlines 

   * next week --> tests
   * 15th March --> production
 
---+++ AOB 

   * update UI machine
      * 5 - 9 March reinstall UI
      * announce half day downtime

   * VOBoxes --> Atlas VO could be moved to Xen node

   * responsibility for Twiki areas (CSCS will take care)
      * Create 1 page per VO
      * add a page with logs of problems
      * VOBoxes page with info about how to start services
      * There will be one page per service/server involved 

   * We will produce a list of things to be checked before entering production
      * what needs to be available at cscs
      * What configurations needs to be modified at Tier-1 level
      * For every VO there will be a list of configurations to check

   * We need to prepare a message for all users about the scheduled entering production
---

---++ Resume of the Configuration 

   * 1 X2200 = cluster management system 
   * 1 X2200 = SRM + dcache domains + LCG SE related software 
   * 1 X4200 = PNFS + posgres 
   * 1 X4500 Thumpers = Gridftpd + dcache pool node 
   * 1 X4500 Thumpers = Gridftpd + dcache pool node 
 
   * ZFS configuration (Proposal): 
      * 1 Thumper test with 4 Raid and 2 parity disks = 16TB + 4 spare disks 
      * 1 Thumper test with 4 Raid and 1 parity disk = 18TB  + 4 spare disks 
 
   * 1 ZFS pool per Thumper 
 
   * 1 Filesystem per VO per Thumper 
   * each VO gets space on both Thumpers 
 
   * Thumper dcache configuration 
      * each Thumper will have 1 dcache pool per VO/FS (CMS,Atlas,dteam) 
      * 1 Thumper will also have FS for lhcb and hone 

---

-- Main.SergioMaffioletti - 2 March 2007
This topic: LCGTier2 > WebHome > RoadMap > SUNCluster > PhenixMinutes020307
Topic revision: r3 - 2007-03-02 - SergioMaffioletti