Second Steering Board Meeting

  • The meeting takes place on Tue, Feb 22nd, 14-16h
  • Location: ETHZ, Building/Room No. LFW E 11 (Room reserved from 14-17 h, has Beamer, Reserv. Number E156029)

Introduction of our new systems engineer Fabio Martinelli

Fabio Martinelli has joined our group on Febuary 1st, and he will take charge of the Tier-3. He already has begun to introduce a number of systematic improvements on the hardware monitoring level.

To be discussed

  • shared home directories
    • Enforced User quotas. What is the acceptable size for user home directories (currently we calculate 100 TB)
    • Enforced phys group quotas for shome?
  • SE
    • policy quotas on SE for users and phys groups
  • UI
    • scratch directory quotas? automatic cleaning (also for User Interfaces)
      • Derek: I think we should not have quotas on scratch. This will hurt users more than it is useful. Cleaning of scratch on a weekly basis should be enforced. If users use scratch for semi-permanent storage, we should investigate why and try to find a better solution (extra disk on some system?)
    • (from Urs) Distribution of user interfaces among institutes? Aim is to avoid resource conflicts (e.g. scratch overusage)
  • WN
    • (from Urs) debugging (possibly interactive) access to specific wn required?
  • review guest user policy: How many guest users can a phys group have... for how long?
  • planning of HW resources (see below)
  • should T3 be extended to also have a CE (to increase usage)?

Hardware situation and possible extensions

The current feeling is that we have enough CPU resources, but we could benefit from more storage (ca 100-150 TB more would probably be necessary)

Machines going out of warranty this year:

Node type node name Hardware warranty date
Admin node SUN X4150 2011-05-16
NFS experiment software server, log server t3nfs01 SUN X4150 2011-05-16
NFS home directory + VM server t3fs06 Thumper 2011-02-14
Home directory backup t3fs05 Thumper 2011-02-14
SE File servers t3fs01-t3fs04 Thumper 2011-06-02
Computing Element + frontier, mon t3ce01 SUN X4150 2011-05-16
SE head node t3se01 SUN X4150 2011-05-16
SE data base t3dcachedb01 SUN X4150 2011-05-16
User interfaces t3ui01-04 SUN X4150 2011-05-16
Virtual machine hosts t3vmmaster01, t3wn08 SUN X4150 2011-05-16
old worker nodes t3wn02-04 SUN X4150 2011-05-16

  • We can use an older X4150 WN to replace parts in one of the other X4150 machines
  • We could offline one thumper as a source for disks for failing disks in other thumpers. As a first measure, it would be good to buy a few replacement disks
  • Service nodes: We will try to put all non-IO intensive services onto the PSI virtualization infrastructure.

Possible upgrade of UI machines with more local disks

Mail from Mr. P. Eberhard from Oracle (2011-02-17): Regarding the disks for the X4150, we still have the following:

  • XRB-SS2CF146G10K-N
    • 146GB 10K RPM 2.5" SAS hard disk drive with Marlin bracket. RoHS-6. (x-option), 375.00 CHF
  • XRA-SS2CF300G10K-N
    • 300GB 10K RPM 2.5" SAS hard disk drive with Marlin bracket. (x-option) RoHS-6, 786.00 CHF

Mail from D. Feichtinger

Dear PSI-Tier3 Steering Board Members

We received a request from Urs Langenegger whether we would allow a second guest user for the b-physics group on our Tier-3. At our initial meeting we had defined a policy that one guest user per physics group would be accepted (policies are written down on .

Current situation:
* We have now ca 50 users (will provide better numbers taking inavtive users into account)
* CPU Resources are not tight. The queues are rarely contested these months
* SE space (ca 200 TB shared between users and data sets) is tight. According to we currently host 106 TB of user data and 84 TB of "official" data
* We do no automatic enforcement of the SE policies. Need also to improve on accounting

On the short term, to answer Urs' request: Should the additional guest user be accepted as a temporary exception (should we set policy limitations)? Could we discuss this either in this mail thread, or if necessary in a short phone conference, if that is preferred.

On the longer term: We should meet early next year to talk about the development and operations of the system (new requirements, policies), now that we really have many active users. In Febuary, a dedicated system adminstrator will start working at PSI. The T3 will be his main responsibility. He will be able to implement better resource accounting, etc. I think it would be ideal if we could set up the steering board meeting for mid-Febuary (if there are no pressing reasons to do it earlier). If this sounds good to you, I will set up a doodle poll.


-- DerekFeichtinger - 2010-12-21

Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r11 - 2011-02-22 - DerekFeichtinger
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback