Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people * #Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Phenix updated installation and configuration status * Minutes of the phone call held the 13th February 2007 * Participants: * Sergio Maffioletti * Alessandro Usai * Tom Guptil * Derek Feichtinger * Sigve Haug * Zhiling Chen ---+++ Status of installation and configuration * *WN (X2200)* * SLC308 [ok] * LCG/GLite [ok] * WN integrated into old LRMS [ok] * we could have all WNs integrated in short time (sharing /apps via NFS) * *CE (X4200)* * SLC4 [ok] * for the time being we agreed on having the old ce01-lcg used and Torque as LRMS * SGE integration will be tested and plan to have it in production as soon as it will be stable * Nordugrid will have to be checked and tested too * *Problem encoutered, solutions and workarounds* * SLC306 does not work (missing controller drivers) * SLC4 does work but installing LCG/Glite sw is error prone * Thumper installation [ok] but we need to test ZFS functionalities * Tom proposed to change the current RAID configuration to use only 1 parity disk; this would give additional 4TB at the expense of reliability [still to be decided] * SUN N1 is not suitable for cluster management therefore we will use cfengine * planning to have all cluster management services on 1 X2200 on Linux (possibly with Solaris on a Virtual Machine) * 2 X4200 --> should become free * *Tests* (tentative dates) * Reliability tests on Thumpers (Tom + Alessandro) --> 12 - 16 February * Performance test from WNs to Thumpers via dcache --> 14 - 21 February * Test different configurations of ZSF and dcache --> 12 - 23 February * *Organisation of the dCache tests* * functionality tests * VO codes * local load tests (mainly dcap): * writing files in parallel from multiple nodes * reading same file from multiple nodes * trying to write file that is being written by another process * erasing file that is being read by another process * measure I/O rates as function of parallel clients * WAN protocol tests (SRM, gridftp) * CMS PhEDEx transfers * Storage access profile of CMS jobs -> they will use dcap protocol * Storage access profile of Atlas jobs -> for those using ARC, maily the access is through SRM and/or Gridftp * each VO should prepare their own specific tests * General test suite ( local and wan test ) will be prepared by Derek * Sigve will forward the test description to the Atlas contact to check if Atlas will need additional tests ---+++ SE dcache configuration scenario * PNFS + posgres DB on a fat node = 1 X4200 * SRM + Dcache domains + LCG/GLite sw on standard WN = 1 X2200 * gridftp + few dcache modules still to be checked = 2 X4500 * we still need to understand the proper scenario * We all agree on such configuration * things tobe checked: * is it necessary to mount PNFS on Thumper ? apparently yes if Thumper is running Gridftp ( thanks to Lionel Schwarz ) * what is necessary for WN to use the dcap protocol to access the dcache pools ? apparently the client-only dcache package ( Alessandro will check ) ---+++ Planning migration of DPM data to dcache pool * Migrate DPM data to the new SE (se02-lcg.projects.cscs.ch) * Users will have to migrate their data and update catalog * With the introduction of the new SE as the default CSCS/CHIPP SE, we will have to change few settings in FTS and IS (Alessandro will check) * Derek and Sigve will check what is necessary to be done for CMS and Atlas VOs for the support of the new SE * Current se01-lcg will be kept as backup for initial time and then re-converted as dcache pool ---+++ VO Disk space shares: * Agreed to have Filesystem <-> VO mapping as has been done in Phenix I * Accepted proposal: * Each VO will have access to both Thumpers * Atals = 2 x 6TB * CMS = 2 x 6TB * LhCb = 1 x 1/2 TB * Hone = 1 x 1/2 TB * dteam = 1 x 1/2 TB * spare = 2 x 3.5TB * spare disk will be available to all VOs at request * we may also take space from dteam * lhcb should agree on having initially only 1/2TB ( Derek will contact ) ---+++ What bandwidth can we expect: * WN = 1GB link * Thumper = 4 x 1GB links trunked * from CSCS to Karlsruhe -> 20MB/s should be guaranteed ( CSCS will check ) ---+++ VO CPU shares based on queue priority including nordugrid- queues. * For the time being we will keep the configuration of the queues as they are * We will observe the behavior of the queues * When we will migrate to the new SGE-based CE, we will address the fare-share issue ---+++ Integration with Phenix 1 cluster * integration of WN * Agreed to integrate 10WNs when installation will be stable ---+++ Deadlines * We can make it for the end of February * Next week we will have more info ---+++ AOB * update UI machine (Strange java exception errors)? * UI will be re-installed in the next two weeks * Proposal to migrate to a server box ( gain reliability ) * VOBoxes * should we reinstall as true LCG VO-Boxes? This would provide gsissh and easier myproxy management * We are planning to migrate these boxes anyway * responsibility for Twiki areas (CSCS will take care) * Create 1 page per VO * add a page with logs of problems * VOBoxes page with info about how to start services --- ---++ Resume of the Configuration * 1 X2200 = cluster management system * 1 X2200 = SRM + dcache domains + LCG SE related software * 1 X4200 = PNFS + posgres * 1 X4500 Thumpers = Gridftpd + dcache pool node * 1 X4500 Thumpers = Gridftpd + dcache pool node * ZFS configuration (Proposal): * 1 Thumper test with 4 Raid and 2 parity disks = 16TB + 4 spare disks * 1 Thumper test with 4 Raid and 1 parity disk = 18TB + 4 spare disks * 1 ZFS pool per Thumper * 1 Filesystem per VO per Thumper * each VO gets space on both Thumpers * Thumper dcache configuration * each Thumper will have 1 dcache pool per VO/FS (CMS,Atlas,dteam) * 1 Thumper will also have FS for lhcb and hone --- -- Main.SergioMaffioletti - 13 Feb 2007
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r1 - 2007-02-13
-
SergioMaffioletti
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback