Tags:
meeting1Add my vote for this tag SwissGridOperationsMeeting1Add my vote for this tag create new tag
view all tags

Swiss Grid Operations Meeting on 2014-12-04

Site status

CSCS

  • Maintenance of December 3 went smoothly: CSCS connected via a 100G link to SWITCH (Phoenix still at 20G though)
  • ARC monitored on NGI Nagios: WebServices configuration issues (as for now enabled only on arc01.lcg.cscs.ch)
  • perfSONAR: a couple of old WNs chosen as HW replacement for the old instances
  • Reminder: Next F2F meeting on January 29 2015 at CSCS

PSI

  • Using the Puppet 3 source_permissions feature to copy files and dirs without specifying owner, group modes. It's like a rsync. I wasn't aware of it.
  • Using the SaltStack batch mode feature to run a command on groups of filtered servers:
    • To appreciate this I assume you're use to older tools like cexec or pdsh
    • Those tools require you to write a static configuration file where you define your cluster(s); these definitions can only use hostnames.
    • In SaltStack each client ( minion ) constantly publishes its live info ( grains ); core grains are SSDs biosreleasedate biosversion cpu_flags cpu_model cpuarch domain fqdn fqdn_ip4 fqdn_ip6 gpus host hwaddr_interfaces id ip4_interfaces ip6_interfaces ip_interfaces ipv4 ipv6 kernel kernelrelease locale_info localhost machine_id manufacturer master mem_total nodename num_cpus num_gpus os os_family osarch oscodename osfinger osfullname osmajorrelease osrelease osrelease_info path productname ps pythonexecutable pythonpath pythonversion saltpath saltversion saltversioninfo selinux serialnumber server_id shell virtual zmqversion but you can define your own grains prod dev webserver db rackposition etc..
    • By leveraging on the grains values you can dynamically filter the minions, split them in groups ( fixed amount or % ), and run a command in these groups as a sequence.
    • Running in small groups is useful when you're involving a 3rd party service ftp http puppet rsync NFS ... and you don't want to open tens of connections against it.
    • My most recurring case is puppet. saltmaster# salt -b 3 -C 't3wn* and G@osmajorrelease:6' cmd.run 'puppet agent -t '
    • All the commands you run are saved by SaltStack, kinda 'job system'
    • Another ( no groups this time ) example: salt -C 't3ui* and not G@kernelrelease:2.6.32-358.2.1.el6.x86_64' cmd.run 'uname -a'
  • Tried http://xrootd.org v4 ; I've the impression that it requires IPv6 since I couldn't start it without a IPv6 ip. Need to double check it.
  • Working together with my boss Derek to prepare the 5th PSI T3 Steering Board Meeting ( UniZ/ETHZ/PSI ): a lot of time spent here.
  • Reading the dCache 2.6 to 2.10 upgrade guide
  • Is somebody going to attend The Condor Workshop at CERN next week ? I'll probably attend it remotely.

UNIBE-LHEP

UNIBE-ID

  • Security incident at site CAMK[EGI-20140130]
    • Some attack attempts from the given IPs in EGI-Security report; no successful login found.
  • Operations
    • smooth and reliable; no issues
    • the 16 new DALCO compute nodes are operational => decommissioning of the old Sun Bladecenter on 2014-12-11

UNIGE

  • New disk space for the AMS experiment added
    • +84 TB in NFS space
    • disk now: 709 TB (474 TB in the DPM SE, 235 TB on NFS)
  • One incident with a full NFS file system
    • a Solaris 9 disk server Sun X4540 blocked a few times
    • impossible to unmount the file system or to shut down properly
    • rebooting all clients, having to reset many of them
    • this does not happen often...
  • ARC front end filling up /var
    • lack of log rotate for /var/log/arc/bdii/bdii-update.log
  • Our /cvmfs over NFS getting slow again, overloaded
    • no visible problem to the users, but need to watch this issue
    • may need more machines for /cvmfs, we have many directories
      ls /cvmfs
      ams.cern.ch atlas.cern.ch atlas-condb.cern.ch atlas-nightlies.cern.ch geant4.cern.ch icecube.wisc.edu na61.cern.ch sft.cern.ch

NGI_CH

  • perfSONAR3.4 upgrade (re-instantiation) as response to ShellSchock. New instructions include new mesh configurations

  • SAM Update-23; Release early this week (?) - Old OPS VOMS decommissioned on November 26th
  • GOCDB: "Prod=Y and Mon=N" changed to "Prod=Y and Mon=Y" for all services except emi.ARGUS and VOMS
  • NGI_CH ARGUS deployment completed: https://ggus.eu/index.php?mode=ticket_info&ticket_id=99533

Other topics

  • Possibility of local accounts for a limited number of power users (direct batch submission) at the T2? (request from ETH CMS group)
  • Topic2
Next meeting date:

A.O.B.

Attendants

  • CSCS: Gianni Ricciardi
  • CMS: Fabio Martinelli, Daniel Meister
  • UNIBE-ID Nico Färber
  • ATLAS: Gianfranco Sciacca, Szymon Gadomski
  • LHCb: Roland Bernet
  • EGI: Gianfranco Sciacca

Action items

  • Item1
Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r17 - 2015-03-03 - DanielMeister
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback