Fourth PSI Tier-3 Steering board meeting

Venue

Date and time: Link to the active doodle poll. Duration: 2h.

Location: ETH Hoenggerberg (Room will be announced)

Guide for commenters

Please pick yourself a color by looking at my examples in the source text (Derek: comment). This makes it easier to follow the discussion about certain items. You may also send your comments as mail to the admin mailing list, if you do not have a wiki account with write permission. We will add your comments to the discussion.

You can point your users to this page and ask for their opinion, or whether there are important things that they feel should be addressed. The hope is that we all can already think a bit about the points in advance, and thereby make good use of our meeting time for the decisions that need to be taken.

Proposed agenda and discussion

  • Procedures for dealing with users leaving the collaboration: The admins never get notified about leaving users (Urs: I'd rephrase this: The admins never did anything in reponse to me reporting users that had left. The other members of the steering board never reacted to my urge to update my list with their names.). It is important to get these notifications for freeing up resources and security reasons. We must put a working system in place.
    • Policies about how to deal with leaving users. This is just a proposal for discussion
      • The institution's contact is responsible for notifying about leaving users (or he organizes notification by the deparment assistant or similar)
      • How long after leaving is a user allowed to use the cluster: 6 weeks (Urs: this must be longer, more like 6 months, and exceptions should be made about finishing theses)
      • What happens with the user's data on home and SE storage: The user and his working group need to communicate about what shall happen to the data. After the quarantine period (6 weeks, q.v. above) the data needs to come under the responsibility of another user (it will be "chowned") or maybe group and be counted against the respective quota.
      • What about data that is the basis of a publication? If we cannot delete the data, where do we store it? Do we need tape services from PSI or CSCS? (Urs: I think this "archival" is over-kill. In principle all data can be regenerated.)
    • Urs: Can we not just use the board meeting to go through the names? The previous "policy" has not worked for 4 years, and why should that change in the next 6 months? Going through the user list takes <10 minutes and then we get rid of users who have not been at PSI for up to 4 years(!).
      • Might be, but here I would like to insist on behalf of the administrators on a working procedure. We have that for every other resource at PSI, and I am sure that it is the same for ETH and UNIZ services. A notification must be possible, and I feel it should be possible to delegate this to the people connected to the normal leavin procedures (e.g. department assistants).
  • Storage Element Quotas
    • based on our current policy statements, all resources are to be equally shared among users. The SE is daily quite full and people are consuming space very unequally.(Urs: I think it's obvious that the two sentences are not in contradiction...)
    • We need to define policies for user SE quotas
      • Since quotas cannot technically be enforced by the dcache storage system, they will rather be implemented in the form of Nagios monitoring and polling the user for cleaning up (e.g. by email)
      • how can we enforce the cleanup with unresponsive users?
        • Derek: I propose that such cases are escalated to the steering board before we enforce a deletion
      • what about users that store a large amount of samples in their folder that is then actually used by a lot of people; will they be constantly asked to clean up?
        • Derek: yes. There is no other sensible way, except for putting the files into a group space
        • Daniel: We could set higher per user quotas in the alert scripts upon request?
        • : Derek: But this would mean a micromanagement on a per user basis... also, how to enforce justice? It would mean that one user gets more, but we have to deduct from the quotas of others. If we introduce group folders with group quotas, this could be done.
        • Urs: If group quotas are doable, then we should that.
  • Group directories on the SE
    • Even though this is not foreseen in the CMS model, people have started to create group folders under the store/user/ directory. This leads to a number of problems in regard to finding who is responsible for these files, who cleans them, etc.
      • Derek I would propose that group folders must all go to /store/group.
      • Daniel: Fine with me; then we should also set quotas for those.
      • I think the only way of solving that correctly would be by counting the group space against the users' quota (equipartition)
      • Whereas with the new file ownerships, a user can protect his files from erroneous deletion by others, the group files probably will remain world writable (except if we introduced unix groups representing these groups). Need to discuss about what users think about the risk of that, especially in context with mounting of the filespace through NFSv4.
  • Tools for organizing files on the SE
    • do we want to go for write mounting the filespace on the UIs? This has become an option now with the new Chimera enabled dcache version.
    • a read-only mount would be very useful on its own; just the cleanup of files is a bit cumbersome, since it still would require going through SRM protocol.
    • Fabio new uberftp in t3ui0[1,8,9] allows recursive options: -chgrp [-r] group , -chmod [-r] perms , -dir [-r] , -ls [-r] , -rm [-r]
  • NX compatibility problem: The new NX Player v4, the only one installable on Mac OS X 10.8, versions do not work nicely with the freeNX 3.4 software that is installed on our UIs. The commercial version server does work, but in its free version only allows 2 users accounts to be enabled. We would have to buy licenses from nomachine.
    • Derek: I propose to collect the numer of users using NX and suffering from the problem (most distributions still have the working 3.4 version clients. Mac users are affected.). We may also decide to just equip a subset of the UIs with this.
  • Upgrade Plans and schedule for 2013
    • SE storage expansion ( new 2 boxes, 60 disks each ).
      • Fabio:we'll grow of 264TB net, so from the current 482TB net to 746TB net.(Urs: How much money is this? How much more money is required to add another box?)
    • SE dCache upgrade from 1.9.12 ( End of Support Apr '13 ) to 2.2 ( End of Support Apr '14 ). I.e. we need to find another slot for a short downtime.Fabio: min 1 day
    • Inform steering board on plans for NFS shome and swshare space infrastructure (keep on running old HW for 1-2 years)
    • Inform users about UI updates
      • Do users want RAID-0 or RAID-1 /scratch space? (Urs: I don't think [most] users care much about backup of scratch space. It is scratch space. The faster the better, i.e. Raid-0.)
  • UI scratch space Urs: We do need significantly larger scratch disks mounted locally.
    • Derek: I think there are three measures that we can take in parallel (buying disks for these old systems is not too easy and you overpay)
      1. as Fabio proposes, make everywhere RAID0 and use disks of some old machines to arrive at ~3 * 146GB disks per UI
      2. less users per UI: introduce more UIs. Old service machines that are now freed up by migrating services to the VM infrastructure, can be used. Also, we could convert a few of the old worker nodes (the cluster is hardly ever full, so this would not have a large impact)
      3. Policy: Part of the scratch space is often taken by files of a more permanent nature. Question is whether the desire of users to use more space for long lifetime files cannot be satisfied differently (Urs: the "more permanent files" are often local mirrors of files on the SE. It is a factor 2-4 faster to run an ntuple analysis on a local file instead of the SE.)
      4. Fabio: by recycling old HW I made 3 brand new UIs with 3 disks each; I can do the same for the others 4 of 6 UIs; in my opinion 7 UIs with a 261GB /scratch + 2 UIs with a 220GB /scratch are enough, not need to buy HW or invest more effort on this topic
  • Urs: NFS mounted homes and gold
  • . This is about the GNU gold linker bug that results in empty files being produced on our NFS homes. Daniel has analyzed the situation and even submitted a patch to the gold developers. But we do not know when it will be included.
    • Do we just live with this situation? (many people in our group find the situation quite painful!)
      • Derek: Currently, it would require local patching or providing workarounds for every release. When CMS goes to central SW distribution, this even may become harder. Upgrading the OS of the NFS server (Solaris) also might solve the situation, but we do not know. OS upgrade is only possible with a Oracle support contract. We have been offered such a contract for a reasonable price. We'll inform you in the meeting.
  • Urs: home directory size
    • I'd like to understand the use-case for having such large homes (sorry if this is too 'technical' or too 'detailed', but I have questioned the size of the homes for a long time and it has been a limitation recently).
      • Derek: From our admin side, we can just say that such volumes seem to be desired by many users. We cannot say how reasonable it is. We should be able to stand 150GB per active user, but this should be the limit for home space, looking at the technology that we must be continue using for this year..
      • Fabio: probably by simply dropping old users we'll get back several GBs, let's do this exercise first.

    -- DerekFeichtinger - 2013-01-08

    Edit | Attach | Watch | Print version | History: r45 | r19 < r18 < r17 < r16 | Backlinks | Raw View | Raw edit | More topic actions...
    Topic revision: r17 - 2013-01-31 - UrsLangenegger
     
    • Edit
    • Attach
    This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
    Ideas, requests, problems regarding TWiki? Send feedback