create new tag
view all tags

Scheduled Maintenance on 2012-07-04

The next first working Wednesday of the month we will go into Scheduled Downtime. It will last from 9:00 to 18:00, but we will return to operation as soon as we finish.

As usual, CMS and Atlas queues will be closed 24 hours before the maintenance, and LHCb queue will close 48 hours before the maintenance.

Summary of interventions

We will perform the following operations on the cluster:

Upgrade kernel on SL6 nodes DONE

  • Description: There is a security issue affecting RHEL6 kernels, we need to upgrade them
  • Affected nodes: kvm01, cvmfs
  • Notes:

Bios/ILOM upgrade on gpfs nodes DONE

  • Description: The PCIe bus on all X4270 nodes has a known bug that may cause problems with disks and infiniband
  • Affected nodes: oss[11-42], mds[1-2]
  • Notes:
    load -source

Torque upgrade to 2.4.17 DONE

  • Description: There are two bug fixes solved that affect us
  • Affected nodes: lrms[01-02], wn[01-46], cream[01,02], arc[01-02]
  • Notes:
    dsh -g WN -g CREAM_CE -g ARC_CE 'rpm -Uvh http://repo/torque-2.4.17-1.cri.x86_64.rpm http://repo/torque-client-2.4.17-1.cri.x86_64.rpm'
    dsh -w lrms[01-02] 'rpm -Uvh http://repo/torque-2.4.17-1.cri.x86_64.rpm http://repo/torque-client-2.4.17-1.cri.x86_64.rpm http://repo/torque-server-2.4.17-1.cri.x86_64.rpm http://repo/torque-devel-2.4.17-1.cri.x86_64.rpm'
    dsh -g WN -g CREAM_CE -g ARC_CE -g LRMS 'rpm -qa | grep ^torque | sort' | dshbak -c
    ssh lrms01 'grid-service stop'
    ssh lrms02 'grid-service restart'
    dsh -g WN -g CREAM_CE -g ARC_CE 'grid-service restart'
    ssh lrms01 'grid-service restart'

Kernel upgrade on all Thors DONE

  • Description: Thors are having spontaneous 'soft lockup' BUGs, that seems to be kernel-related. We need to upgrade to the latest, and hope for the best.
  • Affected nodes: se[30-39]
  • Notes: We have finally decided to upgrade all the dCache servers to the latest distro/kernel/security upgrades, not just the Thors
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2012-07-04 - PabloFernandez
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback