Tags:
meeting1Add my vote for this tag SwissGridOperationsMeeting1Add my vote for this tag create new tag
view all tags

Swiss Grid Operations Meeting on 2013-05-02

Agenda

Status

  • CSCS (report Pablo, Miguel & George):
    • Moved cream02 to new IBM hardware and upgraded it to SL 6.3 and CREAM-CE UMD-2.
      • Comment: upgrade of cream01 is planned right before the maintenance (Monday 13 May).
    • Prepared dCache 1.9.12 migration to 2.2.10 on preproduction.
      • Comment: process seems straightforward but we have to be careful since it will involve also moving head nodes to new hardware and SL 6.4
    • 'Fixed' issues with WNs not being reinstalled and moved installation procedure to SL 5.9.
      • Comment: migration to SL 6 on WNs is planned for July, one month after ATLAS expected upgrade calendar.
    • Installed new VM (atlas01) to replace atlasvobox and requested certificate.
      • Question: When could this machine be ready? Gianfranco: hope for 2 weeks from now
    • Increased specs of future replacement for cmsvobox (cms01) in order to cope with service's memory leaking problem.
      • Question: When could cmsvobox be shut down?
    • Maintenance scheduled for May 15: https://wiki.chipp.ch/twiki/bin/view/LCGTier2/SiteMaintenance20130515
  • PSI (reports Fabio):
    • Once again I remember that now SL6 delivers ZFS, I'll try for sure.
    • PSI is migrating its VMWare cluster onto new HW/ESX, we got several VMs in fs readonly or totally stuck; this was due to a mistake of the local Admins but I was also not increasing the SCSI timeout; so even if your VM is not VMWare based is a good idea to run: echo 180 > /sys/block/sda/device/timeout
    • I'll attend the 7th dCache WS
    • Implementing a "soft" dCache 2.2 quotas system based on GIDs ; we're:
      • Using LDAP PosixGroups ( defined in /etc/openldap/schema/nis.schema ) to partition our 100 CMS users in ~10 new Primary groups ( a user belongs to just 1 group )
      • Beacuse of this partitioning an user connected in a UI where /pnfs is mounted will see dirs like:
        ls -l /pnfs/psi.ch/cms/trivcat/store/user | drwxrwxr-x 3 cmsuser cms 512 Jun 15 2012 acaudron drwxr-xr-x 2 alschmid uniz-bphys 512 Feb 21 11:04 alschmid drwxr-xr-x 2 amarini ethz-ewk 512 Jan 24 15:53 amarini drwxr-xr-x 18 andis ethz-bphys 512 Jan 5 2010 andis drwxr-xr-x 36 arizzi ethz-bphys 512 Aug 3 2011 arizzi 
      • By setting the file/dir permissions to allow just his/her group to write users can protect their group files.
      • By having these new Primary groups it's easy to make /pnfs accounting by GID
        chimera=> select igid,sum(isize) as sum from t_inodes group by igid order by sum desc ; igid | sum ------+----------------- 500 | 212853755148360 # 500 is the CMS group that store all the Phedex files. 533 | 130072146368598 534 | 25902761600489 536 | 23833376390438 532 | 18310416555193 531 | 16152944599625 530 | 10590297019580 538 | 1140547783970 537 | 316829040978 535 | 36287017607 800 | 42217296 550 | 43560 
      • We defined a formula to compute quota ( GID ) and we'll check by Nagios the real /pnfs group usage vs quota ( GID )
      • dCache side we dynamically create /etc/grid-security/storage-authzdb according to these new GIDs when a user leaves/joins the cluster.

  • UNIBE (reports Gianfranco):
    • Progress in commissioning of Phoenix PhaseC hardware: 75% of the WNs installed, SLC6.3, ROCKS 6.1, 15 thumpers for Lustre 2.4 OSS, 1 MDS for lustre 2.4.
    • Installation is generally fiddly (thumpers need to be tried 2 or 3 times -with excatly the same procedure-, before it goes through), but it eventually works.
    • A number of WN report: No disk is available for installation - Your BIOS is broken. Investigation is undergoing, BIOS version seems to be the same ol all nodes
    • ARC and SGE versions installed (GE-2011.11p1-1.x86_64 and nordugrid-arc-2.0.1-1.el6.x86_64) do not play. /usr/share/arc/SGEmod.pm fills the infoprovider with the info from the batch service. The script supports versions 5 and 6 of GE, the qstat header turns out to be totally different in version 2011.11p1. Will open a feature request on the ND bugzilla, but a solution for immediate production will be to hack the script.

      Nordugrid Bugzilla feature request: http://bugzilla.nordugrid.org/cgi-bin/bugzilla/show_bug.cgi?id=3171
      Also GGUS ticket (this is NOT repliacetd to the central GGUS which is wrong): https://xgus.ggus.eu/ngi_ch/index.php?mode=ticket_info&ticket_id=232
  • UNIGE (reports Szymon):
    • Xxx
  • UZH (reports Sergio):
    • Xxx
  • Switch (reports Alessandro):
    • End of EMI: impact on the EGI releases, bug fixes/update/support under discussion (MeDIA consortium)
    • SGAS server at SWITCH to be shut down/migrated to a SWING partner by the end of 2013: ARC in current EMI3 release cannot publish to an APEL server, it will come with the next ARC release; ATLAS could decide to use the NorduGrid repository though...
    • NGI_CH ARGUS server? Any thoughts?
    • ARC gridftp test in Nagios was deprecated by EGI, but not by WLCG, now fixed
    • Next week we will upgrade the Nagios production instance to update 20 (test instance was updated 2 weeks ago). Updates are regularly installed as update from X to X+2 is not supported; some updates are also critical because of newtests/functionalities/security.
Other topics
  • Next CSCS Phoenix maintenances are scheduled for May 15 (dCache update) and then July 3 (WNs move to SL6).
  • Topic2
Next meeting date:

Proposed date is Thursday, June 6, 2013

Attendants

  • CSCS: Pablo, Miguel, George
  • CMS: Daniel, Derek, Fabio
  • ATLAS:
  • LHCb: Roland
  • EGI:

Action items

  • Item1
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2013-05-02 - GianfrancoSciacca
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback