<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup,Main.EgiGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Swiss Grid Operations Meeting on 2013-05-02 * *Date and time*: First Thursday of the month, at 14:00 * *Place*: Vidyo (room: Swiss_Grid_Operations_Meeting, extension: 9227296) * *External link*: http://vidyoportal.cern.ch/flex.html?roomdirect.html&key=Nrq24qRR4V1u * *Phone gate*: From Switzerland: 0225330322 (portal) + 9227296 (extension) + # (pound sign) * *IRC chat*: irc:gridchat.cscs.ch:994#lcg (ask pw via email) ---++ Agenda Status * CSCS (report Pablo, Miguel & George): * Moved cream02 to new IBM hardware and upgraded it to SL 6.3 and CREAM-CE UMD-2. * Comment: upgrade of cream01 is planned right before the maintenance (Monday 13 May). * Prepared dCache 1.9.12 migration to 2.2.10 on preproduction. * Comment: process seems straightforward but we have to be careful since it will involve also moving head nodes to new hardware and SL 6.4 * 'Fixed' issues with WNs not being reinstalled and moved installation procedure to SL 5.9. * Comment: migration to SL 6 on WNs is planned for July, one month after ATLAS expected upgrade calendar. * Installed new VM (atlas01) to replace atlasvobox and requested certificate. * Question: When could this machine be ready? Gianfranco: hope for 2 weeks from now * Increased specs of future replacement for cmsvobox (cms01) in order to cope with service's memory leaking problem. * Question: When could cmsvobox be shut down? * Maintenance scheduled for May 15: https://wiki.chipp.ch/twiki/bin/view/LCGTier2/SiteMaintenance20130515 * PSI (reports Fabio): * Once again I remember that now [[http://listserv.fnal.gov/scripts/wa.exe?A2=ind1303&L=scientific-linux-users&T=0&X=5B929F66B2B2056912&Y=christopher.brown%40med.ge.com&P=21739][SL6 delivers ZFS]], I'll try for sure. * PSI is migrating its VMWare cluster onto new HW/ESX, we got several VMs in fs readonly or totally stuck; this was due to a mistake of the local Admins but I was also not increasing the [[https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Online_Storage_Reconfiguration_Guide/task_controlling-scsi-command-timer-onlining-devices.html][SCSI timeout]]; so even if your VM is not VMWare based is a good idea to run: =echo 180 > /sys/block/sda/device/timeout= * I'll attend the [[http://www.dcache.org/manuals/2013/workshop/][7th dCache WS]] * Implementing a "soft" dCache 2.2 quotas system based on =GIDs= ; we're: * Using [[http://ldapwiki.willeke.com/wiki/PosixGroup][LDAP PosixGroups]] ( defined in =/etc/openldap/schema/nis.schema= ) to partition our 100 CMS users in ~10 new Primary groups ( a user belongs to just 1 group ) * Beacuse of this partitioning an user connected in a UI where =/pnfs= is mounted will see dirs like: <pre>ls -l /pnfs/psi.ch/cms/trivcat/store/user | drwxrwxr-x 3 cmsuser cms 512 Jun 15 2012 acaudron drwxr-xr-x 2 alschmid uniz-bphys 512 Feb 21 11:04 alschmid drwxr-xr-x 2 amarini ethz-ewk 512 Jan 24 15:53 amarini drwxr-xr-x 18 andis ethz-bphys 512 Jan 5 2010 andis drwxr-xr-x 36 arizzi ethz-bphys 512 Aug 3 2011 arizzi </pre> * By setting the file/dir permissions to allow *just his/her group* to write users can protect their group files. * By having these new Primary groups it's easy to make =/pnfs= accounting by =GID= <pre>chimera=> select igid,sum(isize) as sum from t_inodes group by igid order by sum desc ; igid | sum ------+----------------- 500 | 212853755148360 # 500 is the CMS group that store all the Phedex files. 533 | 130072146368598 534 | 25902761600489 536 | 23833376390438 532 | 18310416555193 531 | 16152944599625 530 | 10590297019580 538 | 1140547783970 537 | 316829040978 535 | 36287017607 800 | 42217296 550 | 43560 </pre> * We defined a formula to compute =quota ( GID )= and we'll check by Nagios the real =/pnfs= group usage vs =quota ( GID )= * dCache side we dynamically create =/etc/grid-security/storage-authzdb= according to these new GIDs when a user leaves/joins the cluster. * UNIBE (reports Gianfranco): * Progress in commissioning of Phoenix PhaseC hardware: 75% of the WNs installed, SLC6.3, ROCKS 6.1, 15 thumpers for Lustre 2.4 OSS, 1 MDS for lustre 2.4. * Installation is generally fiddly (thumpers need to be tried 2 or 3 times -with excatly the same procedure-, before it goes through), but it eventually works. * A number of WN report: No disk is available for installation - Your BIOS is broken. Investigation is undergoing, BIOS version seems to be the same ol all nodes * ARC and SGE versions installed (GE-2011.11p1-1.x86_64 and nordugrid-arc-2.0.1-1.el6.x86_64) do not play. /usr/share/arc/SGEmod.pm fills the infoprovider with the info from the batch service. The script supports versions 5 and 6 of GE, the qstat header turns out to be totally different in version 2011.11p1. Will open a feature request on the ND bugzilla, but a solution for immediate production will be to hack the script. <br /><br />Nordugrid Bugzilla feature request: <a target="_blank" href="http://bugzilla.nordugrid.org/cgi-bin/bugzilla/show_bug.cgi?id=3171">http://bugzilla.nordugrid.org/cgi-bin/bugzilla/show_bug.cgi?id=3171</a><br />Also GGUS ticket (this is NOT repliacetd to the central GGUS which is wrong): https://xgus.ggus.eu/ngi_ch/index.php?mode=ticket_info&ticket_id=232 * UNIGE (reports Szymon): * Xxx * UZH (reports Sergio): * Xxx * Switch (reports Alessandro): * End of EMI: impact on the EGI releases, bug fixes/update/support under discussion (MeDIA consortium) * SGAS server at SWITCH to be shut down/migrated to a SWING partner by the end of 2013: ARC in current EMI3 release cannot publish to an APEL server, it will come with the next ARC release; ATLAS could decide to use the NorduGrid repository though... * NGI_CH ARGUS server? Any thoughts? * ARC gridftp test in Nagios was deprecated by EGI, but not by WLCG, now fixed * Next week we will upgrade the Nagios production instance to update 20 (test instance was updated 2 weeks ago). Updates are regularly installed as update from X to X+2 is not supported; some updates are also critical because of newtests/functionalities/security. Other topics * Next CSCS Phoenix maintenances are scheduled for May 15 (dCache update) and then July 3 (WNs move to SL6). * Topic2 Next meeting date: Proposed date is Thursday, June 6, 2013 ---++ Attendants * CSCS: Pablo, Miguel, George * CMS: Daniel, Derek, Fabio * ATLAS: * LHCb: Roland * EGI: ---++ Action items * Item1
This topic: LCGTier2
>
WebHome
>
MeetingsBoard
>
MeetingSwissGridOperations20130502
Topic revision: r11 - 2013-05-02 - GianfrancoSciacca
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback