Tags: view all tags

Scheduled Maintenance on 2013-08-21

The next first working Wednesday of the month we will go into Scheduled Downtime. It will last from 9:00 to 18:00, but we will return to operation as soon as we finish.

As usual, CMS and Atlas queues will be closed 24 hours before the maintenance, and LHCb queue will close 48 hours before the maintenance.

_ REMOVE: REMEMBER TO ADD DOWNTIME IN GOCGB and CLOSE THE QUEUES_

Summary of interventions

We will perform the following operations on the cluster:

Put FDR bridge into production
Restart dCache services

Put FDR bridge into production

Description: FDR bridge is to be put into production for one week as a test
Affected nodes: All publicly accessed nodes within the cluster will be affected.
Notes:

Chris is only available to perform this change between 9 and 12, it is estimated to take around 20 minutes.

Change MTUs back to larger values (IB MTU=65520/ ETH MTU=9000)
- ping -s 3000 -M dont
- Attempt gridftp transfers of large files within the cluster

Restart dCache services

Description: Due to an error in the logback.xml file dCache has been creating trace logs under /tmp. Disabling the logging within the dCache CLI has not worked as dCache is still logging to these files under /proc as they are still open.
Affected nodes: storage01, storage02, se01-se14
Notes:

Run lsof and ensure these files are not opened by dCache after restart

A workaround of a cron job /etc/cron.weekly/clean_dcache should be removed after.

Storage01 and storage02 have logrotate scripts for the these files as they are present on the filesystem unlike the se machines. This should also be removed.