Tags: view all tags

Fourth PSI Tier-3 Steering board meeting

Venue

Date and time: Monday, Feb 11, 14-16h (Indico Page)

Location: ETH Hoenggerberg (HPK D32)

Slides: Presentation by Fabio Martinelli

Meeting Minutes

Meeting minutes by D. Meister

Task list based on the decisions

Highest Priority:

Provide NFS read access to users
Prepare dcache and postgresql update (possible date: 22nd and 28th of March)

Detailed task list:

Cleanup of LDAP
- [] Enforce consistency in Institution field Fabio
  - Note: The institute field is part of the gecos/cn attributes, not an own attribute
  - How do we deal with Guest users? Keep "Guest - Institution" convention? The physgroup to which they belong is given by the x-cms-physics-group attribute. Guests are 6: bora, folguera, lmartini, mdjordjie, nsahoo, paktinat
  - I regret my earlier decision to implement phys-groups through an own ldap attribute and not using unix groups. Should we change that... we may also leave it (may create too much fuzz)
- [ ] Enforce consistency in phys-group attribute Daniel (with help from Derek)
  - do we actually need phys groups on the T3. do they really have a meaning? Daniel thinks they still make sense, and that the resouce groups actually will be subsets of the phys-groups. Also, the phys-group leaders will be the responsible persons for group formation, etc.
  - The information in LDAP has to be updated to reflect the current situation. Also we need to keep it up to date later on

Establish a procedure for leaving user
- [] Implement a script which generates user lists for each institution Fabio, created =/usr/local/monuser/pnfs.usage.PSI.UniZ.ETHZ.py=
  - should include some metrics on resource consumption (storage) and activity
  - How do we deal with guest users? They belong to a phys-group, not to an institution. - Each institute's list should include all guest users
- [] use an ldap field for expiry information Derek (Need to identify schema field we want to use for that purpose) Fabio: we use shadowExpire field for each ldap user
- [] send e-mail every 6 months
- [] Write up a procedures document: ProcedureForLeavingUsers Daniel

Fabio: generally speaking the following two bullets have been merged in this New Primary Groups proposal

Group directories We first need to agree on a group model: This is in preparation as of 2013-04-05 and is subject to a number of technical and procedural constraints
- [ongoing] Check for dcache support of secondary groups Fabio
- [ ] Do we need to identify a group leader? If yes, how do we mark him in LDAP?
  - Answer: We think that no leader is needed. All mailings will go to all group members
- [ongoing] We need a procedure/mechanism for setting the appropriate file permissions Fabio, ongoing due to dCache bugs as Ticket#7757
  - it would be highly beneficial to restrict file creation/deletion inside of group folders to the members of the owner group Fabio, files will be written with the Primary group and not write permission for that, one needs uberftp -chgrp to change group and -chmod to change modes
  - can a user do a chmod and we can delegate the fine tuning? Fabio, yes with uberftp -chmod , but still not working on dCache-2.2.10-1 ( so far the latest 2.2 )
- [] implement groups and member accounting Fabio
  - proposal: use unix groups stored in our LDAP. The groups must be clearly identifiable, so that scripts can differentiate them from simple groups (important for accounting scripts, etc. Name convention or own OU)
- [ ] migrate existing group directories from /store/user to /store/t3group Fabio
  - this needs interaction with the affected users
- [ ] set up a procedure for group management T3GroupsDaniel
  - creation and deletion (how do users request a group creation)
  - adding and deleting members
  - pages for users to see information about the groups

SE quotas Fabio (notice: this is intimately connected to the groups issue above)
- Notice: The SE storage expansion will only arrive in May. Until then we will always be in shortage.
- [ ] Implement a script which correctly calculates the effective used space per user with inclusion of group space
  - proposed model: equipartition of used group space among the members and adding that space to each user's consumed resources
  - the model should have the flexibility that we can also add a segment manually to the allowed group space that is not accounted against the users.
- [ ] users should be able to see the current status of all users/groups on a monitoring page, including how much more space a group is allowed to use
- [ ] Implement a notification mechanism for users exceeding their quota
- [ ] Set up a procedure for how to deal with users in violation of their quota for longer than a certain threshold time
- [ ] Cleaning campaign tool. Tool for identifying obsolete files on the SE Maybe we need some propaganda plots.

Tools for organizing files on the SE
- [] do tests with NFS read mounted dcache Fabio
  - comment on tests Daniel and Derek
- [ ] We need to upgrade postgres. There was a dcache problem report about a postgresql version which caused slow listing of directories. This requires a downtime. Probably in mid march. Fabio
- [ongoing] provide a tool for recursive rm, directory mv Fabio (Fabio has found uberftp allowing to do recursive remove + chgrp. No move, but less important)
  - wait until we have better technical understanding of permission handling, secondary group support, etc.

/shome NFS service Derek
- [] order premium support for Thumpers/Thors from Oracle (ordered, paid by UniZ, waiting for product )

NX Derek (seems to me that this is lower prio, now)
- [O] Get an overview on how many users use it and how often
  - proposal: first check NX account login statistics and if possible log file
  - maybe send out survey
- [] confirm whether Fabio's workaround is ok Fabio
- [O] There is a fork of NX 3.5 that may grow to good open source replacement: X2go. We may want to test that (others at PSI also interested and maybe testing) Derek: For SLS we now went the commercial NX way. X2go too risky in that production environment

Proposed agenda and initial discussion

Procedures for dealing with users leaving the collaboration: The admins never get notified about leaving users (Urs: I'd rephrase this: The admins never did anything in reponse to me reporting users that had left. The other members of the steering board never reacted to my urge to update my list with their names.). It is important to get these notifications for freeing up resources and security reasons. We must put a working system in place.
- Policies about how to deal with leaving users. This is just a proposal for discussion
  - The institution's contact is responsible for notifying about leaving users (or he organizes notification by the deparment assistant or similar)
  - How long after leaving is a user allowed to use the cluster: 6 weeks (Urs: this must be longer, more like 6 months, and exceptions should be made about finishing theses)
  - What happens with the user's data on home and SE storage: The user and his working group need to communicate about what shall happen to the data. After the quarantine period (6 weeks, q.v. above) the data needs to come under the responsibility of another user (it will be "chowned") or maybe group and be counted against the respective quota.
  - What about data that is the basis of a publication? If we cannot delete the data, where do we store it? Do we need tape services from PSI or CSCS? (Urs: I think this "archival" is over-kill. In principle all data can be regenerated.)
- Urs: Can we not just use the board meeting to go through the names? The previous "policy" has not worked for 4 years, and why should that change in the next 6 months? Going through the user list takes <10 minutes and then we get rid of users who have not been at PSI for up to 4 years(!).
  - Might be, but here I would like to insist on behalf of the administrators on a working procedure. We have that for every other resource at PSI, and I am sure that it is the same for ETH and UNIZ services. A notification must be possible, and I feel it should be possible to delegate this to the people connected to the normal leavin procedures (e.g. department assistants).
Storage Element Quotas
- based on our current policy statements, all resources are to be equally shared among users.
  - Fabio:The SE is daily full and people are consuming space very unequally. (Urs: I think it's obvious that the two sentences are not in contradiction...)
- We need to define policies for user SE quotas
  - Since quotas cannot technically be enforced by the dcache storage system, they will rather be implemented in the form of Nagios monitoring and polling the user for cleaning up (e.g. by email)
  - how can we enforce the cleanup with unresponsive users?
    - Derek: I propose that such cases are escalated to the steering board before we enforce a deletion
  - what about users that store a large amount of samples in their folder that is then actually used by a lot of people; will they be constantly asked to clean up?
    - Derek: yes. There is no other sensible way, except for putting the files into a group space
    - Daniel: We could set higher per user quotas in the alert scripts upon request?
    - Derek: But this would mean a micromanagement on a per user basis... also, how to enforce justice? It would mean that one user gets more, but we have to deduct from the quotas of others. If we introduce group folders with group quotas, this could be done.
    - Urs: If group quotas are doable, then we should that.
Group directories on the SE
- Even though this is not foreseen in the CMS model, people have started to create group folders under the store/user/ directory. This leads to a number of problems in regard to finding who is responsible for these files, who cleans them, etc.
  - Derek I would propose that group folders must all go to /store/group.
  - Daniel: Fine with me; then we should also set quotas for those.
  - I think the only way of solving that correctly would be by counting the group space against the users' quota (equipartition)
  - Whereas with the new file ownerships, a user can protect his files from erroneous deletion by others, the group files probably will remain world writable (except if we introduced unix groups representing these groups). Need to discuss about what users think about the risk of that, especially in context with mounting of the filespace through NFSv4.
Tools for organizing files on the SE
- do we want to go for write mounting the filespace on the UIs? This has become an option now with the new Chimera enabled dcache version.
- a read-only mount would be very useful on its own; just the cleanup of files is a bit cumbersome, since it still would require going through SRM protocol.
- Fabio: new uberftp in t3ui0[1,8,9] allows recursive options: -chgrp [-r] group , -chmod [-r] perms , -dir [-r] , -ls [-r] , -rm [-r] but by trying them I've found a dCache bug accepted by the dCache team; we need to wait for a newer dCache to properly use these -r options. 23th May 2013 uberftp -ls -r and -rm -r work now vs dCache 2.2.11-1, but uberftp -chmod -r not. BTW, users can also use srmls -recursion_depth and srmrmdir -recursive
NX compatibility problem: The new NX Player v4, the only one installable on Mac OS X 10.8, badly transfer the string /usr/bin/konsole into the freeNX 3.4 software that is installed on our UIs. The commercial version works, but in its trial version it allows max 2 users.
Fabio: found a trick to request by default the konsole program, this circumnavigates the badly transfered strings issue.
- Derek: I propose to collect the numer of users using NX and suffering from the problem (most distributions still have the working 3.4 version clients. Mac users are affected.). We may also decide to just equip a subset of the UIs with this.
Upgrade Plans and schedule for 2013
- SE storage expansion ( new 2 boxes, 60 disks each ).
  - Fabio:we'll grow of 264TB net, so from the current 482TB net to 746TB net.(Urs: How much money is this? How much more money is required to add another box?)
- SE dCache upgrade from 1.9.12 ( End of Support Apr '13 ) to 2.2 ( End of Support Apr '14 ). I.e. we need to find another slot for a short downtime.Fabio: min 1 day; if we use this downtime to cable the new disks expansions then min 2 days
- Inform steering board on plans for NFS shome and swshare space infrastructure (keep on running old HW for 1-2 years)
- Inform users about UI updates
  - Do users want RAID-0 or RAID-1 /scratch space? (Urs: I don't think [most] users care much about backup of scratch space. It is scratch space. The faster the better, i.e. Raid-0.)
UI scratch space Urs: We do need significantly larger scratch disks mounted locally.
- Derek: I think there are three measures that we can take in parallel (buying disks for these old systems is not too easy and you overpay)
  1. as Fabio proposes, make everywhere RAID0 and use disks of some old machines to arrive at ~3 * 146GB disks per UI
  2. less users per UI: introduce more UIs. Old service machines that are now freed up by migrating services to the VM infrastructure, can be used. Also, we could convert a few of the old worker nodes (the cluster is hardly ever full, so this would not have a large impact)
  3. Policy: Part of the scratch space is often taken by files of a more permanent nature. Question is whether the desire of users to use more space for long lifetime files cannot be satisfied differently (Urs: the "more permanent files" are often local mirrors of files on the SE. It is a factor 2-4 faster to run an ntuple analysis on a local file instead of the SE.)
  4. Fabio: by recycling old HW I made 3 brand new UIs with 3 disks each; I can do the same for the others 6 UIs; in my opinion 9 UIs with a 261GB /scratch are enough, not need to buy HW or invest more effort on this topic. Say thanks to the PSI VMWare Infrastructure for this, almost all our T3 services are virtualized now ! :-)

Urs: NFS mounted homes and gold

. This is about the GNU gold linker bug that results in empty files being produced on our NFS homes. Daniel has analyzed the situation and even submitted a patch to the gold developers. But we do not know when it will be included.

Do we just live with this situation? (many people in our group find the situation quite painful!)
- Derek: Currently, it would require local patching or providing workarounds for every release. When CMS goes to central SW distribution, this even may become harder. Upgrading the OS of the NFS server (Solaris) also might solve the situation, but we do not know. OS upgrade is only possible with a Oracle support contract. We have been offered such a contract for a reasonable price. We'll inform you in the meeting.

Urs: home directory size

I'd like to understand the use-case for having such large homes (sorry if this is too 'technical' or too 'detailed', but I have questioned the size of the homes for a long time and it has been a limitation recently).
- Derek: From the admin side, we can just say that such volumes seem to be desired by many users. We are able to stand 150GB per active user and have enough storage to support this for the required number of users. So, this is not a problem. But this should be the limit for home space, looking at the technology that we must continue using for this year..
- Fabio: probably by simply dropping old users we'll get back several GBs, let's do this exercise first. ETHZ requested to delete users on 4th Feb '13.

-- DerekFeichtinger - 2013-01-08

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
pdf	130211_Minutes_Meeting04.pdf	r2 r1	manage	43.4 K	2013-02-14 - 11:41	DanielMeister	Meeting minutes by D. Meister
pdf	SteeringBoard_ETHZ_7-Feb-13.pdf	r1	manage	2140.7 K	2013-02-11 - 10:05	FabioMartinelli	Fabio Martinelli Slides

Topic revision: r45 - 2014-12-13 - FabioMartinelli

CmsTier3

User Pages
Main Page
Policies

Physics Groups
Steering Board Meetings

Admin Pages
AdminArea
Cluster Specs