nodeHealthCheck.sh
script works and this time the filesystem didn't break: CSCS 'only' stopped running jobs.
/gridhome
with ~5TB of storage
/scratch
with the rest (~65TB)
cream04
(not in production) to this new system and test production jobs on the machine. Once we are satisfied, we'd need to establish a downtime and move all the CREAM and ARC-CEs.
cream04
will be in downtime once ServiceGPFS2 is ready and we will use it to test production jobs on the new storage. Also in september, CSCS will most likely upgrade their Ethernet infrastructure and a site-wide downtime needs to be established. This will affect Phoenix, but hopefully by then we'll be ready with GPFS2 to shift all CEs there and avoid another maintenance.
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
png | prod_vojobs-running-week.gif.png | r1 | manage | 13.2 K | 2014-07-03 - 12:11 | MiguelGila | CSCS jobs running over a week |
IBMhwRAID.pdf | r1 | manage | 3183.4 K | 2014-07-03 - 08:40 | SzymonGadomski | Photos of hardware RAID in IBM x3630 M3 at UNIGE |
Warning: Can't find topic "".""
|
|
|