Phenix updated installation and configuration status - 2nd March 2007
- Minutes of the phone call held the 2nd March 2007
- Participants:
- Sergio Maffioletti
- Alessandro Usai
- Tom Guptil
- Derek Feichtinger
- Sigve Haug
Status of installation and configuration
- WN (X2200)
- SLC4.4 [tobe finalized and integrated into cfengine]
- CE (X4200)
- 1 X4200 used for PNFS + postgres [ok]
- WN (X2200) * 1 X2200 used as dcache admin node (SRM + dcache packages) [ok]
- Thumper
- 1 Thumper integrated as pool node [ok]
- Configuration of pools
- 1 pool write only and 1 pool read only per VO
- Problem encoutered, solutions and workarounds
- se02 is not registering properly with dcache admin nodes: pools are seen as non active
- Derek will test dcache installation procedure from scratch on se02
- MonAMI monitor may not be usable as it is querying tables removed from latest dcache release
- The final system will need some ad-hoc scripts/cron jobs for the update of CA list and grid-mapfiles
- Tests (tentative dates)
- Reliability tests on whole system (Derek + Tom + Alessandro) --> 5 - 9 MArch
- Performance test from WNs to Thumpers via dcache (Derek + Sigve + Alessandro) --> 2 - 9 March
- Test different configurations of ZSF [ok]
- Organisation of the dCache tests
- functionality tests
- VO codes
- local load tests (mainly dcap):
- writing files in parallel from multiple nodes
- reading same file from multiple nodes
- trying to write file that is being written by another process
- erasing file that is being read by another process
- measure I/O rates as function of parallel clients
- WAN protocol tests (SRM, gridftp)
- CMS PhEDEx transfers
- Storage access profile of CMS jobs -> they will use dcap protocol
- Storage access profile of Atlas jobs -> for those using ARC, maily the access is through SRM and/or Gridftp
- each VO should prepare their own specific tests
- General test suite ( local and wan test ) will be prepared by Derek
- Sigve will forward the test description to the Atlas contact to check if Atlas will need additional tests
SE dcache open issues
-
- is it necessary to mount PNFS on Thumper ? yes if Thumper is running Gridftp ( thanks to Lionel Schwarz )
- what is necessary for WN to use the dcap protocol to access the dcache pools ? WNs will point todcache admin node and it will redirect to the right Thumper
What bandwidth can we expect:
- We may have preformance problem from Dalco WNs to Thumper due to single Gb link from switch to switch
- Tom will check if both switches can be trunked together
- This is an important issue to keep in mind when planning the extention
Deadlines
- next week --> tests
- 15th March --> production
AOB
- update UI machine
- 5 - 9 March reinstall UI
- announce half day downtime
- VOBoxes --> Atlas VO could be moved to Xen node
- responsibility for Twiki areas (CSCS will take care)
- Create 1 page per VO
- add a page with logs of problems
- VOBoxes page with info about how to start services
- There will be one page per service/server involved
- We will produce a list of things to be checked before entering production
- what needs to be available at cscs
- What configurations needs to be modified at Tier-1 level
- For every VO there will be a list of configurations to check
- We need to prepare a message for all users about the scheduled entering production
Resume of the Configuration
- 1 X2200 = cluster management system
- 1 X2200 = SRM + dcache domains + LCG SE related software
- 1 X4200 = PNFS + posgres
- 1 X4500 Thumpers = Gridftpd + dcache pool node
- 1 X4500 Thumpers = Gridftpd + dcache pool node
- ZFS configuration (Proposal):
- 1 Thumper test with 4 Raid and 2 parity disks = 16TB + 4 spare disks
- 1 Thumper test with 4 Raid and 1 parity disk = 18TB + 4 spare disks
- 1 Filesystem per VO per Thumper
- each VO gets space on both Thumpers
- Thumper dcache configuration
- each Thumper will have 1 dcache pool per VO/FS (CMS,Atlas,dteam)
- 1 Thumper will also have FS for lhcb and hone
--
SergioMaffioletti - 2 March 2007
This topic: LCGTier2
> WebHome >
RoadMap >
SUNCluster > PhenixMinutes020307
Topic revision: r3 - 2007-03-02 - SergioMaffioletti