Swiss Grid Operations Meeting on 2015-10-15

Site status

CSCS

  • dCache
    • Security upgrade to version 2.6.52 also planned to upgrade to last version next month, Dario will perform the upgrade.
    • Dario distributed the remaining storage according to our last f2f meeting.
    • We set up a downtime for tomorrow October 16 to attempt to resolve dCache issues experienced this week
  • Network
    • Infiniband switches and bridges upgraded to last version.

PSI

  • dCache
    • upgraded from 2.10 to the latest 2.13.9 ; this was an easy upgrade, CSCS might try the upgrade 2.6 -> 2.10 -> 2.13 in order to avoid a downtime
    • upgraded accordingly the xrootd monitoring plugin RPM and its conf
    • upgraded to the latest Xrootd RPMs 4.2.3* More... Close
      cms-xrootd-dcache-1.2-7.osg.el6.noarch gfal2-plugin-xrootd-0.3.4-1.el6.x86_64 xrootd-4.2.3-1.el6.x86_64 xrootd-client-4.2.3-1.el6.x86_64 xrootd-client-libs-4.2.3-1.el6.x86_64 xrootd-cmstfc-1.5.1-10.osg32.el6.x86_64 xrootd-fuse-4.2.3-1.el6.x86_64 xrootd-libs-4.2.3-1.el6.x86_64 xrootd-selinux-4.2.3-1.el6.noarch xrootd-server-4.2.3-1.el6.x86_64 xrootd-server-libs-4.2.3-1.el6.x86_64 
    • my materialized views still work 'as they are' with 2.13.9
    • instead the Derek's tools need to be updated because of the new 2.13 Admin door commands ( e.g. not cd Cell but \c Cell ) ; I've partially updated them by a bit of sed but it's still a working in progress.
  • Latest Python packages on SL6
    • This is mainly addressed to the T3s because they face the final user issues ; at PSI Scientists use Anaconda on SL6 in order to easily use an updated and extended Python distribution ; I run an installation of Anaconda as well, it's very easy both use it and update it
  • NFSv3
  • Nagios4
  • Compact UIs/WNs featuring many disks
  • CMS PhEDEx
  • Son of Grid Engine and cgroups
    • No progress, too busy.

UNIBE-LHEP

  • Operations
    • Relatively smooth running period (partly unattended)
    • I/O errors on lustre mds (ce02) due to a degraded RAID10. Power-cycled, lustre self-recovered relatively quickly
  • ATLAS specific operations
    • Nothing specific to report
    • ATLAS have made a hufe progress on task definitions and makinf the workflows considerably more failsafe, making sites' life considerably simpler
    • Also many ARC bugfixes have contributed
  • Ongoing work
    • Cluster re-installation workflow development finalised:
      • Rocks 6.2
      • SLC 6.7
      • ARC 5.0.3
      • SLURM 15.08.0-1
      • Lustre 2.5.3
    • Plan to re-build ce01 starting next Tuesday
    • Temperature monitoring in progress (room/racks/servers) - Very useful input from PSI smile

UNIBE-ID

  • Operations
    • Smooth operatiosn, only few minor issues
  • Storage Migration
    • Migration plan developed to move from generic GPFS cluster to new IBM ESS
    • Online data migration since 3 weeks (though slow due to compute workload on ethernet)
    • Planned downtime and moving all nodes to the new GPFS cluster on 2015-10-08 (yes, today, therefore no attendee from UNIBE-ID)

UNIGE

  • Xxx

NGI_CH

  • Xxx

Other topics

  • Topic1
  • Topic2
Next meeting date:

A.O.B.

Attendants

  • CSCS: Pablo Fernandez, Dino Conciatore
  • CMS: Fabio Martinelli, Daniel Meister (by phone)
  • ATLAS: Gianfranco Sciacca
  • LHCb:
  • EGI: Gianfranco Sciacca

Action items

  • Item1
Edit | Attach | Watch | Print version | History: r20 | r18 < r17 < r16 < r15 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r16 - 2015-10-15 - FabioMartinelli
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback