Tags:
create new tag
view all tags

Phenix updated installation and configuration status

  • Minutes of the phone call held the 13th February 2007
  • Participants:
    • Sergio Maffioletti
    • Alessandro Usai
    • Tom Guptil
    • Derek Feichtinger
    • Sigve Haug
    • Zhiling Chen

Status of installation and configuration

  • WN (X2200)
    • SLC308 [ok]
    • LCG/GLite [ok]
    • WN integrated into old LRMS [ok]
    • we could have all WNs integrated in short time (sharing /apps via NFS)

  • CE (X4200)
    • SLC4 [ok]
    • for the time being we agreed on having the old ce01-lcg used and Torque as LRMS
    • SGE integration will be tested and plan to have it in production as soon as it will be stable
    • Nordugrid will have to be checked and tested too

  • Problem encoutered, solutions and workarounds
    • SLC306 does not work (missing controller drivers)
    • SLC4 does work but installing LCG/Glite sw is error prone
    • Thumper installation [ok] but we need to test ZFS functionalities
    • Tom proposed to change the current RAID configuration to use only 1 parity disk; this would give additional 4TB at the expense of reliability [still to be decided]
    • SUN N1 is not suitable for cluster management therefore we will use cfengine
    • planning to have all cluster management services on 1 X2200 on Linux (possibly with Solaris on a Virtual Machine)
    • 2 X4200 --> should become free

  • Tests (tentative dates)
    • Reliability tests on Thumpers (Tom + Alessandro) --> 12 - 16 February
    • Performance test from WNs to Thumpers via dcache --> 14 - 21 February
    • Test different configurations of ZSF and dcache --> 12 - 23 February

  • Organisation of the dCache tests
    • functionality tests
    • VO codes
    • local load tests (mainly dcap):
    • writing files in parallel from multiple nodes
    • reading same file from multiple nodes
    • trying to write file that is being written by another process
    • erasing file that is being read by another process
    • measure I/O rates as function of parallel clients
    • WAN protocol tests (SRM, gridftp)
    • CMS PhEDEx transfers
    • Storage access profile of CMS jobs -> they will use dcap protocol
    • Storage access profile of Atlas jobs -> for those using ARC, maily the access is through SRM and/or Gridftp
    • each VO should prepare their own specific tests
    • General test suite ( local and wan test ) will be prepared by Derek
    • Sigve will forward the test description to the Atlas contact to check if Atlas will need additional tests

SE dcache configuration scenario

  • PNFS + posgres DB on a fat node = 1 X4200
  • SRM + Dcache domains + LCG/GLite sw on standard WN = 1 X2200
  • gridftp + few dcache modules still to be checked = 2 X4500
  • we still need to understand the proper scenario
  • We all agree on such configuration
  • things tobe checked:
    • is it necessary to mount PNFS on Thumper ? apparently yes if Thumper is running Gridftp ( thanks to Lionel Schwarz )
    • what is necessary for WN to use the dcap protocol to access the dcache pools ? apparently the client-only dcache package ( Alessandro will check )

Planning migration of DPM data to dcache pool

  • Migrate DPM data to the new SE (se02-lcg.projects.cscs.ch)
  • Users will have to migrate their data and update catalog
  • With the introduction of the new SE as the default CSCS/CHIPP SE, we will have to change few settings in FTS and IS (Alessandro will check)
  • Derek and Sigve will check what is necessary to be done for CMS and Atlas VOs for the support of the new SE
  • Current se01-lcg will be kept as backup for initial time and then re-converted as dcache pool

VO Disk space shares:

  • Agreed to have Filesystem <-> VO mapping as has been done in Phenix I
  • Accepted proposal:
    • Each VO will have access to both Thumpers
    • Atals = 2 x 6TB
    • CMS = 2 x 6TB
    • LhCb = 1 x 1/2 TB
    • Hone = 1 x 1/2 TB
    • dteam = 1 x 1/2 TB
    • spare = 2 x 3.5TB
  • spare disk will be available to all VOs at request
  • we may also take space from dteam
  • lhcb should agree on having initially only 1/2TB ( Derek will contact )

What bandwidth can we expect:

  • WN = 1GB link
  • Thumper = 4 x 1GB links trunked
  • from CSCS to Karlsruhe -> 20MB/s should be guaranteed ( CSCS will check )

VO CPU shares based on queue priority including nordugrid- queues.

  • For the time being we will keep the configuration of the queues as they are
  • We will observe the behavior of the queues
  • When we will migrate to the new SGE-based CE, we will address the fare-share issue

Integration with Phenix 1 cluster

  • integration of WN
    • Agreed to integrate 10WNs when installation will be stable

Deadlines

  • We can make it for the end of February
  • Next week we will have more info

AOB

  • update UI machine (Strange java exception errors)?
    • UI will be re-installed in the next two weeks
    • Proposal to migrate to a server box ( gain reliability )

  • VOBoxes
    • should we reinstall as true LCG VO-Boxes? This would provide gsissh and easier myproxy management
    • We are planning to migrate these boxes anyway

  • responsibility for Twiki areas (CSCS will take care)
    • Create 1 page per VO
    • add a page with logs of problems
    • VOBoxes page with info about how to start services


Resume of the Configuration

  • 1 X2200 = cluster management system
  • 1 X2200 = SRM + dcache domains + LCG SE related software
  • 1 X4200 = PNFS + posgres
  • 1 X4500 Thumpers = Gridftpd + dcache pool node
  • 1 X4500 Thumpers = Gridftpd + dcache pool node

  • ZFS configuration (Proposal):
    • 1 Thumper test with 4 Raid and 2 parity disks = 16TB + 4 spare disks
    • 1 Thumper test with 4 Raid and 1 parity disk = 18TB + 4 spare disks

  • 1 ZFS pool per Thumper

  • 1 Filesystem per VO per Thumper
  • each VO gets space on both Thumpers

  • Thumper dcache configuration
    • each Thumper will have 1 dcache pool per VO/FS (CMS,Atlas,dteam)
    • 1 Thumper will also have FS for lhcb and hone


-- SergioMaffioletti - 13 Feb 2007

Topic revision: r1 - 2007-02-13 - SergioMaffioletti
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback