Phenix updated installation and configuration status

WN (X2200)
- SLC308 [ok]
- LCG/GLite [ok]
- WN integrated into old LRMS [ok]
- we could have all WNs integrated in short time (sharing /apps via NFS)

CE (X4200)
- SLC4 [ok]
- for the time being we agreed on having the old ce01-lcg used and Torque as LRMS
- SGE integration will be tested and plan to have it in production as soon as it will be stable
- Nordugrid will have to be checked and tested too

Problem encoutered, solutions and workarounds
- SLC306 does not work (missing controller drivers)
- SLC4 does work but installing LCG/Glite sw is error prone
- Thumper installation [ok] but we need to test ZFS functionalities
- Tom proposed to change the current RAID configuration to use only 1 parity disk; this would give additional 4TB at the expense of reliability [still to be decided]
- SUN N1 is not suitable for cluster management therefore we will use cfengine
- planning to have all cluster management services on 1 X2200 on Linux (possibly with Solaris on a Virtual Machine)
- 2 X4200 --> should become free

Tests (tentative dates)
- Reliability tests on Thumpers (Tom + Alessandro) --> 12 - 16 February
- Performance test from WNs to Thumpers via dcache --> 14 - 21 February
- Test different configurations of ZSF and dcache --> 12 - 23 February

Organisation of the dCache tests
- functionality tests
- VO codes
- local load tests (mainly dcap):
- writing files in parallel from multiple nodes
- reading same file from multiple nodes
- trying to write file that is being written by another process
- erasing file that is being read by another process
- measure I/O rates as function of parallel clients
- WAN protocol tests (SRM, gridftp)
- CMS PhEDEx transfers
- Storage access profile of CMS jobs -> they will use dcap protocol
- Storage access profile of Atlas jobs -> for those using ARC, maily the access is through SRM and/or Gridftp
- each VO should prepare their own specific tests
- General test suite ( local and wan test ) will be prepared by Derek
- Sigve will forward the test description to the Atlas contact to check if Atlas will need additional tests

PNFS + posgres DB on a fat node = 1 X4200
SRM + Dcache domains + LCG/GLite sw on standard WN = 1 X2200
gridftp + few dcache modules still to be checked = 2 X4500
we still need to understand the proper scenario
We all agree on such configuration
things tobe checked:
- is it necessary to mount PNFS on Thumper ? apparently yes if Thumper is running Gridftp ( thanks to Lionel Schwarz )
- what is necessary for WN to use the dcap protocol to access the dcache pools ? apparently the client-only dcache package ( Alessandro will check )

Migrate DPM data to the new SE (se02-lcg.projects.cscs.ch)
Users will have to migrate their data and update catalog
With the introduction of the new SE as the default CSCS/CHIPP SE, we will have to change few settings in FTS and IS (Alessandro will check)
Derek and Sigve will check what is necessary to be done for CMS and Atlas VOs for the support of the new SE
Current se01-lcg will be kept as backup for initial time and then re-converted as dcache pool

For the time being we will keep the configuration of the queues as they are
We will observe the behavior of the queues
When we will migrate to the new SGE-based CE, we will address the fare-share issue

update UI machine (Strange java exception errors)?
- UI will be re-installed in the next two weeks
- Proposal to migrate to a server box ( gain reliability )

VOBoxes
- should we reinstall as true LCG VO-Boxes? This would provide gsissh and easier myproxy management
- We are planning to migrate these boxes anyway

responsibility for Twiki areas (CSCS will take care)
- Create 1 page per VO
- add a page with logs of problems
- VOBoxes page with info about how to start services

Resume of the Configuration

ZFS configuration (Proposal):
- 1 Thumper test with 4 Raid and 2 parity disks = 16TB + 4 spare disks
- 1 Thumper test with 4 Raid and 1 parity disk = 18TB + 4 spare disks

Thumper dcache configuration
- each Thumper will have 1 dcache pool per VO/FS (CMS,Atlas,dteam)
- 1 Thumper will also have FS for lhcb and hone

This topic: LCGTier2 > WebHome > RoadMap > SUNCluster > PhenixMinutes130207
Topic revision: r1 - 2007-02-13 - SergioMaffioletti