create new tag
view all tags

Scheduled Maintenance on 2012-06-04

Next Monday we will go into a Scheduled Downtime that will last for two days (maximum). It will start on Monday 4th of June at 8:30 and finish next day, on Tuesday, before 18:00.

The purpose of this maintenance is to solve the instabilities we have had with GPFS, and for that we need to make several tests, hence the long operation. In the meantime, CSCS is undergoing a network maintenance, that affects the entire site.

As usual, CMS and Atlas queues will be closed 24 hours before the maintenance, and LHCb queue will close 48 hours before the maintenance.

Summary of interventions

We will perform the following operations on the cluster:

Firmware upgrade on all Sun IB cards

  • Description: We need to upgrade the firmware on the old Sun/Oracle Infiniband cards, to a more modern version.
  • Affected nodes: oss[11-42], mds[1-2], storage[01-02], cream[01-02], se[30-39], xen[11,13,15], cvmfs
  • Notes:

Upgrade kernel-ib and ofed packages

  • Description: Install version 1.5.3 in all machines
  • Affected nodes: cream01,nfs[01-02],se[30-39],xen[11,13]
  • Notes:

Upgrade GPFS

  • Description: It seems like there is a bug in the GPFS version we're using, we may need to upgrade it.
  • Affected nodes: all GPFS servers and clients
  • Notes:

Adjust MTU for JumboFrames in Ethernet hosts (virtual hosts)

  • Description: Need to change the MTU parameter in the network init files, and reboot
  • Affected nodes: kvm01, xen[11,13,15]
  • Notes:
Topic revision: r1 - 2012-05-29 - PabloFernandez
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback