Fourth PSI Tier-3 Steering board meeting

Guide for commenters

  1. Please pick yourself a color by looking at my examples in the source text (Derek: comment). This makes it easier to follow the discussion about certain items.
  2. Please remember that the steering board is a political body, not a technical one. The meeting should be efficient by preparing issues in a way that the members can decide on So, consequences of proposed solutions should be explained in terms that users can understand, costs need to be mentioned, etc. Steering board members are users, and issues should be explained in a way that they can understand and relate to.
  3. There should be no long list of trivia. Rather focus on the most important decisions that we need the steering board to make.

Proposed Agenda

  • Procedures for dealing with users leaving the collaboration: The admins never get notified about leaving users (Urs: I'd rephrase this: The admins never did anything in reponse to me reporting users that had left. The other members of the steering board nevery reacted to my urge to update my list with their names.). It is important to get these notifications for freeing up resources and security reasons. We must put a working system in place.
    • Policies about how to deal with leaving users. This is just a proposal for discussion
      • The institution's contact is responsible for notifying about leaving users (or he organizes notification by the deparment assistant or similar)
      • How long after leaving is a user allowed to use the cluster: 6 weeks (Urs: this must be longer, more like 6 months, and exceptions should be made about finishing theses)
      • What happens with the user's data on home and SE storage: The user and his working group need to communicate about what shall happen to the data. After the quarantine period (6 weeks, q.v. above) the data needs to come under the responsibility of another user (it will be "chowned") or maybe group and be counted against the respective quota.
      • What about data that is the basis of a publication? If we cannot delete the data, where do we store it? Do we need tape services from PSI or CSCS? (Urs: I think this "archival" is over-kill. In principle all data can be regenerated.)
    • Urs: Can we not just use the board meeting to go through the names? The previous "policy" has not worked for 4 years, and why should that change in the next 6 months? Going through the user list takes <10 minutes and then we get rid of users who have not been at PSI for up to 4 years(!).
  • Storage Element Quotas
    • based on our current policy statements, all resources are to be equally shared among users. The SE is daily quite full and people are consuming space very unequally.(Urs: I think it's obvious that the two sentences are not in contradiction...)
    • We need to define policies for user SE quotas
      • Since quotas cannot technically be enforced by the dcache storage system, they will rather be implemented in the form of Nagios monitoring and polling the user for cleaning up (e.g. by email)
      • how can we enforce the cleanup with unresponsive users?
        • Derek: I propose that such cases are escalated to the steering board before we enforce a deletion
      • what about users that store a large amount of samples in their folder that is then actually used by a lot of people; will they be constantly asked to clean up?
        • Derek: yes. There is no other sensible way, except for putting the files into a group space
        • Daniel: We could set higher per user quotas in the alert scripts upon request?
        • : Derek: But this would mean a micromanagement on a per user basis... also, how to enforce justice? It would mean that one user gets more, but we have to deduct from the quotas of others. If we introduce group folders with group quotas, this could be done.
        • Urs: If group quotas are doable, then we should that.
    • since the last upgrade, the stored files are really stored with the UID of the user, so the creator of a file can be identified, users can't change each others' files and dirs; do we want to replicate this same setup @ CSCS ? Could we live with the worst case scenario of everything deleted at CSCS?
      • Derek: This bullet point contains information that without better explanation is completely misleading and will cause panic. The danger of losing all files through user error is mainly connected to the mounting of the NFS file space. Replicating this setup at CSCS requires the definition and maintenance of a Tier-2 allowed users list. How to define and maintain it is subject to discussion for another political body, even though most of the steering board members represent the same user groups. Also, I feel that we first should have an understanding with Pablo from CSCS before we push this change on them.
      • Daniel: Agree, let's cut that point from the T3 steering board meeting discussion. (Personally, I also see some danger without the NFS mount as I know of many people using their self-made scripts to delete recursively.) Urs: Is this an argument that the admins should provide better scripts?
  • Group directories on the SE
    • Even though this is not foreseen in the CMS model, people have started to create group folders under the store/user/ directory. This leads to a number of problems in regard to finding who is responsible for these files, who cleans them, etc.
      • Derek I would propose that group folders must all go to /store/group.
      • Daniel: Fine with me; then we should also set quotas for those.
      • I think the only way of solving that correctly would be by counting the group space against the users' quota (equipartition)
      • Whereas with the new file ownerships, a user can protect his files from erroneous deletion by others, the group files probably will remain world writable (except if we introduced unix groups representing these groups). Need to discuss about what users think about the risk of that, especially in context with mounting of the filespace through NFSv4.
  • Tools for organizing files on the SE
    • do we want to go for write mounting the filespace on the UIs? This has become an option now with the new Chimera enabled dcache version.
    • a read-only mount would be very useful on its own; just the cleanup of files is a bit cumbersome, since it still would require going through SRM protocol.
  • NX compatibility problem: The new NX client 3.5 versions do not work nicely with the freeNX software that is installed on our UIs. The commercial version server does work, but in its free version only allows 2 users accounts to be enabled. We would have to buy licenses from nomachine.
    • Derek: I propose to collect the numer of users using NX and suffering from the problem (most distributions still have the working 3.4 version clients. Mac users are affected.). We may also decide to just equip a subset of the UIs with this.
  • Upgrade Plans and schedule for 2013
    • SE storage expansion ( new 2 boxes, 60 disks each ).
      • Fabio: fyi, we'll grow of 264TB net (Urs: How much money is this? How much more money is required to add another box?)
    • SE dCache upgrade from 1.9.12 ( End of Support Apr '13 ) to 2.2 ( End of Support Apr '14 ). I.e. we need to find another slot for a short downtime.
    • Inform steering board on plans for NFS shome and swshare space infrastructure (keep on running old HW for 1-2 years)
    • Inform users about UI updates
      • Do users want RAID-0 or RAID-1 /scratch space? (Urs: I don't think [most] users care much about backup of scratch space. It is scratch space. The faster the better, i.e. Raid-0.)
      • Urs: We do need significantly larger scratch disks mounted locally.
  • Urs: NFS mounted homes and gold
    • Do we just live with this situation? (many people in our group find the situation quite painful!)
  • Urs: home directory size
    • I'd like to understand the use-case for having such large homes (sorry if this is too 'technical' or too 'detailed', but I have questioned the size of the homes for a long time and it has been a limitation recently).

    -- DerekFeichtinger - 2013-01-08

    Edit | Attach | Watch | Print version | History: r45 | r14 < r13 < r12 < r11 | Backlinks | Raw View | Raw edit | More topic actions...
    Topic revision: r12 - 2013-01-29 - UrsLangenegger
     
    • Edit
    • Attach
    This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
    Ideas, requests, problems regarding TWiki? Send feedback