Tags: view all tags

Fourth PSI Tier-3 Steering board meeting

Venue

Date and time: Monday, Feb 11, 14-16h (Indico Page)

Location: ETH Hoenggerberg (HPK D32)

Slides: Presentation by Fabio Martinelli

Meeting Minutes

Meeting minutes by D. Meister

WORK IN PROGRESS

Procedures for dealing with users leaving the collaboration:
1. Admins will send every 6 months a user list with a request for checking to the institution representative. The list contains all users working on behalf of that institution with an overview of resources consumed (mainly storage) and activity.
2. The institution representative will communicate which users are leaving soon (optimally) or which already have left.
3. The admins will send a standard mail to each leaving user which contains the steps that need to be taken (organizing handover of data, etc.)
4. Users organize the handover with their collaborators
5. As long as institution regards a user as still active (even if his contract has already ended, but he is still finishing the dissertation, he will be regarded as a regular user of this institution). Once the user indeed has left, the account will be rendered inactive after 2 months. Data which has not been taken responsibility of will be deleted, with a prior warning to the institution representative.
Storage Element Quotas
1. A notification system will be put in place that signals when users have exceeded their allotted SE quota. There is no mechanism to enforce the quotas on the storage level.
2. The quota is based on an equipartition model in regard to the number of active users. It takes into account whether users are part of a group.
Group directories on the SE
1. Users can form working groups. They request the creation of such a group from the cluster admins
2. Every group gets a home directory under /store/t3group/
3. The storage resources consumed by the group are equipartitioned onto the storage consumption of each user (?).
- NOTE: We need to get experience with practicality of this model. Should we see that it does not fit the requirements, we will need to reconvene and adapt.
Tools for managing and organizing files on the SE
1. The admins will investigate the read-only mounting (NFS) of the SE storage. Fabio: ongoing
2. Tools for doing recursive efficient operations on the storage will be provided
Home directory space
- Users can consume 150 GB of home space on /shome. We have enough storage to satisfy this.

Task list based on the decisions

* Cleanup of LDAP
  - [ ] Enforce consistency in Institution field
    - Note: The institute field is part of the gecos/cn attributes, not an own
      attribute
    - How do we deal with Guest users? Keep "Guest - Institution" convention?
      The physgroup to which they belong is given by the x-cms-physics-group
      attribute.
    - I regret my earlier decision to implement phys-groups through an own
      ldap attribute and not using unix groups. Should we change that... we
      may also leave it (may create too much fuzz)
  - [ ] Enforce consistency in phys-group field
    - do we actually need phys groups on the T3.. do they really have a meaning?
      Could we not replace them completely with the new group model?

* Propose procedure for leaving users
  - [ ] Implement a script which generates user lists for each institution
    - should include some metrics on resource consumption (storage) and activity
    - How do we deal with guest users? They belong to a phys-group,
      not to an institution.
      - Each institute's list should include all guest users
  - write down an actual procedure in accordance with the meeting's outcome


* Group directories
  - [ ] Check for dcache support of secondary groups
  - [ ] Do we need to identify a group leader? If yes, how do we mark him in LDAP?
  - [ ] We need a procedure/mechanism for setting the appropriate file permissions
    - it would be highly beneficial to restrict file creation/deletion inside
      of group folders to the members of the owner group
    - can a user do a chmod and we can delegate the fine tuning?
  - [ ] implement groups and member accounting
    + proposal: use unix groups stored in our LDAP. The groups must be clearly
      identifiable, so that scripts can differentiate them from simple groups
      (important for accounting scripts, etc. Name convention or own OU)
  - [ ] set up a procedure for group management
    - creation and deletion (how do users request a group creation)
    - adding and deleting members
    - pages for users to see information about the groups
  - [ ] set up the initial groups (based on existing group dirs)
  - [ ] migrate existing group directories from /store/user to /store/t3group
    + this needs interaction with the affected users

* SE quotas
  Notice: The SE storage expansion will only arrive in May. Until then we will
  always be in shortage.
  - [ ] Implement a script which correctly calculates the effective used space
    per user with inclusion of group space
    - proposed model: equipartition of used group space among the
      members and adding that space to each user's consumed resources
    - maybe the model should have the flexibility that we can also
      add a segment manually to the allowed group space that is not
      accounted against the users (just a thought... this all is getting
      complex fast).
    - alternative model: Users communicate how much of their storage they
      want to give into the group space.
  - [ ] users should be able to see the current status of all users/groups
    on a monitoring page, including how much more space a group is allowed
    to use
  - [ ] Implement a notification mechanism for users exceeding their quota
  - [ ] Set up a procedure for how to deal with users in violation
    of their quota for longer than a certain threshold time

* NX
  - [ ] Get an overview on how many users use it and how often
    - proposal: first check NX account login statistics and if possible log file
    - maybe send out survey
  - [ ] confirm whether Fabio's workaround is ok
  - [ ] There is a fork of NX 3.5 that may grow to good open source replacement: X2go.
    We may want to test that (others at PSI also interested and maybe testing)

* Tools for organizing files on the SE
  - [ ] do tests with NFS read mounted dcache
    - note that there was a dcache problem report about a postgresql version which
      caused slow listing of directories
  - [ ] provide a toold for recursive rm, directory mv
    - not sure what is the best way: write an own tool (with safety catches) using
      one of the supported protocols. Or try to implement the gplazma config workaround
      (but there seem to be problems) for getting uberftp to work.

* /shome NFS service
  - [ ] order premium support for Thumpers/Thors from Oracle

Proposed agenda and initial discussion

Procedures for dealing with users leaving the collaboration: The admins never get notified about leaving users (Urs: I'd rephrase this: The admins never did anything in reponse to me reporting users that had left. The other members of the steering board never reacted to my urge to update my list with their names.). It is important to get these notifications for freeing up resources and security reasons. We must put a working system in place.
- Policies about how to deal with leaving users. This is just a proposal for discussion
  - The institution's contact is responsible for notifying about leaving users (or he organizes notification by the deparment assistant or similar)
  - How long after leaving is a user allowed to use the cluster: 6 weeks (Urs: this must be longer, more like 6 months, and exceptions should be made about finishing theses)
  - What happens with the user's data on home and SE storage: The user and his working group need to communicate about what shall happen to the data. After the quarantine period (6 weeks, q.v. above) the data needs to come under the responsibility of another user (it will be "chowned") or maybe group and be counted against the respective quota.
  - What about data that is the basis of a publication? If we cannot delete the data, where do we store it? Do we need tape services from PSI or CSCS? (Urs: I think this "archival" is over-kill. In principle all data can be regenerated.)
- Urs: Can we not just use the board meeting to go through the names? The previous "policy" has not worked for 4 years, and why should that change in the next 6 months? Going through the user list takes <10 minutes and then we get rid of users who have not been at PSI for up to 4 years(!).
  - Might be, but here I would like to insist on behalf of the administrators on a working procedure. We have that for every other resource at PSI, and I am sure that it is the same for ETH and UNIZ services. A notification must be possible, and I feel it should be possible to delegate this to the people connected to the normal leavin procedures (e.g. department assistants).
Storage Element Quotas
- based on our current policy statements, all resources are to be equally shared among users.
  - Fabio:The SE is daily full and people are consuming space very unequally. (Urs: I think it's obvious that the two sentences are not in contradiction...)
- We need to define policies for user SE quotas
  - Since quotas cannot technically be enforced by the dcache storage system, they will rather be implemented in the form of Nagios monitoring and polling the user for cleaning up (e.g. by email)
  - how can we enforce the cleanup with unresponsive users?
    - Derek: I propose that such cases are escalated to the steering board before we enforce a deletion
  - what about users that store a large amount of samples in their folder that is then actually used by a lot of people; will they be constantly asked to clean up?
    - Derek: yes. There is no other sensible way, except for putting the files into a group space
    - Daniel: We could set higher per user quotas in the alert scripts upon request?
    - Derek: But this would mean a micromanagement on a per user basis... also, how to enforce justice? It would mean that one user gets more, but we have to deduct from the quotas of others. If we introduce group folders with group quotas, this could be done.
    - Urs: If group quotas are doable, then we should that.
Group directories on the SE
- Even though this is not foreseen in the CMS model, people have started to create group folders under the store/user/ directory. This leads to a number of problems in regard to finding who is responsible for these files, who cleans them, etc.
  - Derek I would propose that group folders must all go to /store/group.
  - Daniel: Fine with me; then we should also set quotas for those.
  - I think the only way of solving that correctly would be by counting the group space against the users' quota (equipartition)
  - Whereas with the new file ownerships, a user can protect his files from erroneous deletion by others, the group files probably will remain world writable (except if we introduced unix groups representing these groups). Need to discuss about what users think about the risk of that, especially in context with mounting of the filespace through NFSv4.
Tools for organizing files on the SE
- do we want to go for write mounting the filespace on the UIs? This has become an option now with the new Chimera enabled dcache version.
- a read-only mount would be very useful on its own; just the cleanup of files is a bit cumbersome, since it still would require going through SRM protocol.
- Fabio: new uberftp in t3ui0[1,8,9] allows recursive options: -chgrp [-r] group , -chmod [-r] perms , -dir [-r] , -ls [-r] , -rm [-r] but by trying it I've found a dCache bug, accepted by the dCache team, in our dCache installation; we need to wait for the patch to properly use these -r options.
NX compatibility problem: The new NX Player v4, the only one installable on Mac OS X 10.8, badly transfer the string /usr/bin/konsole into the freeNX 3.4 software that is installed on our UIs. The commercial version works, but in its trial version it allows max 2 users.
Fabio: found a trick to request by default the konsole program, this circumnavigates the badly transfered strings issue.
- Derek: I propose to collect the numer of users using NX and suffering from the problem (most distributions still have the working 3.4 version clients. Mac users are affected.). We may also decide to just equip a subset of the UIs with this.
Upgrade Plans and schedule for 2013
- SE storage expansion ( new 2 boxes, 60 disks each ).
  - Fabio:we'll grow of 264TB net, so from the current 482TB net to 746TB net.(Urs: How much money is this? How much more money is required to add another box?)
- SE dCache upgrade from 1.9.12 ( End of Support Apr '13 ) to 2.2 ( End of Support Apr '14 ). I.e. we need to find another slot for a short downtime.Fabio: min 1 day; if we use this downtime to cable the new disks expansions then min 2 days
- Inform steering board on plans for NFS shome and swshare space infrastructure (keep on running old HW for 1-2 years)
- Inform users about UI updates
  - Do users want RAID-0 or RAID-1 /scratch space? (Urs: I don't think [most] users care much about backup of scratch space. It is scratch space. The faster the better, i.e. Raid-0.)
UI scratch space Urs: We do need significantly larger scratch disks mounted locally.
- Derek: I think there are three measures that we can take in parallel (buying disks for these old systems is not too easy and you overpay)
  1. as Fabio proposes, make everywhere RAID0 and use disks of some old machines to arrive at ~3 * 146GB disks per UI
  2. less users per UI: introduce more UIs. Old service machines that are now freed up by migrating services to the VM infrastructure, can be used. Also, we could convert a few of the old worker nodes (the cluster is hardly ever full, so this would not have a large impact)
  3. Policy: Part of the scratch space is often taken by files of a more permanent nature. Question is whether the desire of users to use more space for long lifetime files cannot be satisfied differently (Urs: the "more permanent files" are often local mirrors of files on the SE. It is a factor 2-4 faster to run an ntuple analysis on a local file instead of the SE.)
  4. Fabio: by recycling old HW I made 3 brand new UIs with 3 disks each; I can do the same for the others 6 UIs; in my opinion 9 UIs with a 261GB /scratch are enough, not need to buy HW or invest more effort on this topic. Say thanks to the PSI VMWare Infrastructure for this, almost all our T3 services are virtualized now ! :-)

Urs: NFS mounted homes and gold

. This is about the GNU gold linker bug that results in empty files being produced on our NFS homes. Daniel has analyzed the situation and even submitted a patch to the gold developers. But we do not know when it will be included.

Do we just live with this situation? (many people in our group find the situation quite painful!)
- Derek: Currently, it would require local patching or providing workarounds for every release. When CMS goes to central SW distribution, this even may become harder. Upgrading the OS of the NFS server (Solaris) also might solve the situation, but we do not know. OS upgrade is only possible with a Oracle support contract. We have been offered such a contract for a reasonable price. We'll inform you in the meeting.

Urs: home directory size

I'd like to understand the use-case for having such large homes (sorry if this is too 'technical' or too 'detailed', but I have questioned the size of the homes for a long time and it has been a limitation recently).
- Derek: From the admin side, we can just say that such volumes seem to be desired by many users. We are able to stand 150GB per active user and have enough storage to support this for the required number of users. So, this is not a problem. But this should be the limit for home space, looking at the technology that we must continue using for this year..
- Fabio: probably by simply dropping old users we'll get back several GBs, let's do this exercise first. ETHZ requested to delete users on 4th Feb '13.

-- DerekFeichtinger - 2013-01-08

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
pdf	130211_Minutes_Meeting04.pdf	r1	manage	43.4 K	2013-02-14 - 09:41	DerekFeichtinger	Meeting minutes by D. Meister
pdf	SteeringBoard_ETHZ_7-Feb-13.pdf	r1	manage	2140.7 K	2013-02-11 - 10:05	FabioMartinelli	Fabio Martinelli Slides