Tags:
create new tag
view all tags

Scheduled Maintenance on 2013-08-21

The next first working Wednesday of the month we will go into Scheduled Downtime. It will last from 9:00 to 18:00, but we will return to operation as soon as we finish.

As usual, CMS and Atlas queues will be closed 24 hours before the maintenance, and LHCb queue will close 48 hours before the maintenance.

_ REMOVE: REMEMBER TO ADD DOWNTIME IN GOCGB and CLOSE THE QUEUES_

Summary of interventions

We will perform the following operations on the cluster:


Put FDR bridge into production

  • Description: FDR bridge is to be put into production for one week as a test
  • Affected nodes: All publicly accessed nodes within the cluster will be affected.
  • Notes:

  • Chris is only available to perform this change between 9 and 12, it is estimated to take around 20 minutes.

  • Change MTUs back to larger values (IB MTU=65520/ ETH MTU=9000)
    • ping -s 3000 -M dont
    • Attempt gridftp transfers of large files within the cluster

Restart dCache services

  • Description: Due to an error in the logback.xml file dCache has been creating trace logs under /tmp. Disabling the logging within the dCache CLI has not worked as dCache is still logging to these files under /proc as they are still open.
  • Affected nodes: storage01, storage02, se01-se14
  • Notes:

  • Run lsof and ensure these files are not opened by dCache after restart

  • A workaround of a cron job /etc/cron.weekly/clean_dcache should be removed after.

  • Storage01 and storage02 have logrotate scripts for the these files as they are present on the filesystem unlike the se machines. This should also be removed.
Topic revision: r1 - 2013-08-13 - GeorgeBrown
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback