Fourth PSI Tier-3 Steering board meeting

Guide for commenters

  1. Please pick yourself a color by looking at my examples in the source text (Derek: comment). This makes it easier to follow the discussion about certain items.
  2. Please remember that the steering board is a political body, not a technical one. The meeting should be efficient by preparing issues in a way that the members can decide on So, consequences of proposed solutions should be explained in terms that users can understand, costs need to be mentioned, etc. Steering board members are users, and issues should be explained in a way that they can understand and relate to.
  3. There should be no long list of trivia. Rather focus on the most important decisions that we need the steering board to make.

Proposed Agenda

  • Storage Element Quotas
    • based on our current policy statements, all resources are to be equally shared among users. The SE is daily quite full and people are consuming space very unequally.
    • We need to define policies for user SE quotas
      • Since quotas cannot technically be enforced by the dcache storage system, they will rather be implemented in the form of Nagios monitoring and polling the user for cleaning up (e.g. by email)
      • how can we enforce the cleanup with unresponsive users? (Derek: I propose that such cases are escalated to the steering board before we enforce a deletion)
      • what about users that store a large amount of samples in their folder that is then actually used by a lot of people; will they be constantly asked to clean up? (Derek: yes. There is no other sensible way, except for putting the files into a group space Daniel: We could set higher per user quotas in the alert scripts upon request?)
    • what about leaving users; how long can their data stay on the SE (same question for /shome)? (Derek: I propose that we give ca 6 weeks of time, and the data then must get adopted by other users and counted against their shares Daniel: Ok, agree. The exact time window could be discussed but we need some rules.)
    • since the last upgrade, the stored files are really stored with the UID of the user, so the creator of a file can be identified, users can't change each others' files and dirs; do we want to replicate this same setup @ CSCS ? Could we live with the worst case scenario of everything deleted at CSCS? (Derek: This bullet point contains information that without better explanation is completely misleading and will cause panic. The danger of losing all files through user error is mainly connected to the mounting of the NFS file space. Replicating this setup at CSCS requires the definition and maintenance of a Tier-2 allowed users list. How to define and maintain it is subject to discussion for another political body, even though most of the steering board members represent the same user groups. Also, I feel that we first should have an understanding with Pablo from CSCS before we push this change on them. Daniel: Agree, let's cut that point from the T3 steering board meeting discussion. (Personally, I also see some danger without the NFS mount as I know of many people using their self-made scripts to delete recursively.))
  • Group directories on the SE
    • Even though this is not foreseen in the CMS model, people have started to create group folders under the store/user/ directory. This leads to a number of problems in regard to finding who is responsible for these files, who cleans them, etc.
      • (Derek I would propose that group folders must all go to /store/group. Daniel: Fine with me; then we should also set quotas for those.)
      • Whereas with the new file ownerships, a user can protect his files from erroneous deletion by others, the group files probably will remain world writable (except if we introduced unix groups representing these groups). Need to discuss about what users think about the risk of that, especially in context with mounting of the filespace through NFSv4.
  • Tools for organizing files on the SE
    • a read-only mount would be very useful on its own; just the cleanup of files is a bit cumbersome (Derek: This needs an explanation Daniel: This statement comes mostly from my personal working experience; most often I need to read and/or list files from the SE and from time to time I need to delete files on the SE; I almost never need to copy directly to the SE (from the UIs). So when excluding deletions, about 99% of the use-cases would be covered by a read-only write, that would be much safer.)
    • do we want to go for write mounting the filespace on the UIs? This has become an option now with the new Chimera enabled dcache version.
  • Upgrade Plans and schedule for 2013
    • SE storage expansion ( new 2 boxes, 60 disks each ). How much more space? (Derek: What is meant with this question? Is it a question to the steering board as to how much they want for the future, or is it about us informing them how much space the new expansion will provide)
    • SE dCache upgrade from 1.9.12 ( End of Support Apr '13 ) to 2.2 ( End of Support Apr '14 ). I.e. we need to find another slot for a short downtime.
    • We need to plan for a new filesystem + new HW for /shome ; So far RHEL7 + BTRFS looks the most natural replacement; BTRFS doesn't make RAID6 but we can buy a RAID6 HW controller. Instead of BTRFS we can use GPFS, but that will be more expensive because of the licenses. Are we sure BTRFS is stable enough? (This discussion is too technical for the steering board. If we want to discuss this, it should rather be done by informing about the need, the different options we see, and estimates on cost. )
    • UI updates? (: What exactly should the steering board decide about UI updates? Daniel: Sorry, I assumed that this section was about informing the steering board of the comming updates (see also points above), so I wanted to mention that we should also include the UI software upgrades into this list. The thing we could discuss about concerning the UIs is whether we want RAID-0 or RAID-1 /scratch space.)

-- DerekFeichtinger - 2013-01-08

Edit | Attach | Watch | Print version | History: r45 | r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2013-01-22 - DanielMeister
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback