Second Steering Board Meeting
- The meeting takes place on Tue, Feb 22nd, 14-16h
- Location: ETHZ, Building/Room No. LFW E 11 (Room reserved from 14-17 h, has Beamer, Reserv. Number E156029)
Introduction of our new systems engineer Fabio Martinelli
Fabio Martinelli has joined our group on Febuary 1st, and he will take charge of the Tier-3. He already has begun to introduce a number of systematic improvements on the hardware monitoring level.
To be discussed
- shared home directories
- Enforced User quotas. What is the acceptable size for user home directories (currently we calculate 100 TB)
- Enforced phys group quotas for shome?
- SE
- policy quotas on SE for users and phys groups
- UI
- scratch directory quotas? automatic cleaning (also for User Interfaces)
- Derek: I think we should not have quotas on scratch. This will hurt users more than it is useful. Cleaning of scratch on a weekly basis should be enforced. If users use scratch for semi-permanent storage, we should investigate why and try to find a better solution (extra disk on some system?)
- (from Urs) Distribution of user interfaces among institutes? Aim is to avoid resource conflicts (e.g. scratch overusage)
- WN
- (from Urs) debugging (possibly interactive) access to specific wn required?
- review guest user policy: How many guest users can a phys group have... for how long?
- planning of HW resources (see below)
- should T3 be extended to also have a CE (to increase usage)?
Hardware situation and possible extensions
The current feeling is that we have enough CPU resources, but we could benefit from more storage (ca 100-150 TB more would probably be necessary)
Machines going out of warranty this year:
Node type |
node name |
Hardware |
warranty date |
Admin node |
SUN X4150 |
2011-05-16 |
Computing Element + frontier, mon |
t3ce01 |
SUN X4150 |
2011-05-16 |
Home directory backup |
t3fs05 |
Thumper |
2011-02-14 |
NFS experiment software server, log server |
t3nfs01 |
SUN X4150 |
2011-05-16 |
NFS home directory + VM server |
t3fs06 |
Thumper |
2011-02-14 |
old worker nodes |
t3wn02-04 |
SUN X4150 |
2011-05-16 |
SE data base |
t3dcachedb01 |
SUN X4150 |
2011-05-16 |
SE File servers |
t3fs01-t3fs04 |
Thumper |
2011-06-02 |
SE head node |
t3se01 |
SUN X4150 |
2011-05-16 |
User interfaces |
t3ui01-04 |
SUN X4150 |
2011-05-16 |
Virtual machine hosts |
t3vmmaster01, t3wn08 |
SUN X4150 |
2011-05-16 |
- We can use an older X4150 WN to replace parts in one of the other X4150 machines
- We could offline one thumper as a source for disks for failing disks in other thumpers. As a first measure, it would be good to buy a few replacement disks
- Service nodes: We will try to put all non-IO intensive services onto the PSI virtualization infrastructure.
Possible upgrade of UI machines with more local disks
Mail from Mr. P. Eberhard from Oracle (2011-02-17): Regarding the disks for the X4150, we still have the following:
- XRB-SS2CF146G10K-N
- 146GB 10K RPM 2.5" SAS hard disk drive with Marlin bracket. RoHS-6. (x-option), 375.00 CHF
- XRA-SS2CF300G10K-N
- 300GB 10K RPM 2.5" SAS hard disk drive with Marlin bracket. (x-option) RoHS-6, 786.00 CHF
Mail from D. Feichtinger
Dear PSI-Tier3 Steering Board Members
We received a request from Urs Langenegger whether we would allow a second guest user for the b-physics group on our Tier-3. At our initial meeting we had defined a policy that one guest user per physics group would be accepted (policies are written down on https://wiki.chipp.ch/twiki/bin/view/CmsTier3/PhysicsGroupsOverview) .
Current situation:
* We have now ca 50 users (will provide better numbers taking inavtive users into account)
* CPU Resources are not tight. The queues are rarely contested these months
* SE space (ca 200 TB shared between users and data sets) is tight. According to http://t3mon.psi.ch/addmon/sespace.txt we currently host 106 TB of user data and 84 TB of "official" data
* We do no automatic enforcement of the SE policies. Need also to improve on accounting
On the short term, to answer Urs' request: Should the additional guest user be accepted as a temporary exception (should we set policy limitations)? Could we discuss this either in this mail thread, or if necessary in a short phone conference, if that is preferred.
On the longer term: We should meet early next year to talk about the development and operations of the system (new requirements, policies), now that we really have many active users. In Febuary, a dedicated system adminstrator will start working at PSI. The T3 will be his main responsibility. He will be able to implement better resource accounting, etc. I think it would be ideal if we could set up the steering board meeting for mid-Febuary (if there are no pressing reasons to do it earlier). If this sounds good to you, I will set up a doodle poll.
Cheers,
Derek
--
DerekFeichtinger - 2010-12-21