Swiss Grid Operations Meeting on 2015-11-10
Site status
CSCS
Systems:
- HP Smart array issues (config loss and no boot), lost a lot of time with the HP support. Self solution found: Disable smart array and enable legacy mode for the boot disk.
- Prolonged IB Bridges warranty until spring 2016
- Requested new certificates for argus* with correct DNS AltName
- LHCb job are still not running well, we suggested to Vladimir to use the right runtime env (env/proxy and glite), but still no changes.
- CMS is testing multicore jobs
- Working hard to finalize arc02 puppet cofiguration.
- We are planning to dismiss cream04
- Planning the upgrade of libnss (Advisory-SVG-2015-CVE-2015-7183) almost all services on the cluster are affected.
- Getting offers for the Phoenix expansion
Storage:
- Scratch - GPFS: Netapp storage firmware upgrade (no service interruption).
- dCache:
- We still have the cleaner problem, mainly with CMS. At the moment the cleaner needs to be executed manually but the situation has been stabilized after some big deletions from CMS.
- This week we should finalise the configuration of a pre-production system where we will test the 2.6 -> 2.10 (2.13) upgrade in order to be able to upgrade the production by the end of this month.
PSI
UNIBE-LHEP
- Operations
- Smooth(-ish) operation on ce02, quite stable at just over 500 cores. Nothing of relevance to report
- Re-deployment of the ce01 cluster under way:
- SLC 6.7 and ARC 5.0.3 (needed a downgrade of opeldap* to have a functional resource bdii on the ARC CE)
- about 900 worker-cores installed
- new lustre (version 2.5.3, 200 disks), Thumpers decommissioned
- moved to slurm, cutting my teeth on it.
- hope to go online in the next few hours
- Patching against CVE-2015-7183 (nss*, nspr* from slc6-testing)
- ATLAS specific operations
- Implementing the requested monthly dumps of the namespace on the DPM SE.
UNIBE-ID
- Commissioning
- Ordered the first 32 new compute nodes (Broadwell) with a total of 640 cores; delivered in 12/2015
- Another 32 nodes will get ordered early in 2016
- Operations
- Prolonged maintenance down due to painful migration to the new GPFS storage
- Lesson learned (us + IBM techie!): Using AFM and additonally doing rsyncs is a huge no go and leads to a corrupted filesystem when disabling AFM in the end
- though no data loss
- Since then smooth operation again
- Upgrade of libnss (Advisory-SVG-2015-CVE-2015-7183) done tomorrow within the already setup maintenance down
- ATLAS specific operations
- no problems
- ordered new SSL certificate for nordugrid.unibe.ch due to STRICT_RFC2818 switch by Globus GSI clients
UNIGE
NGI_CH
Other topics
- Daniel being replaced as CMS contact person
- Topic2
Next meeting date:
A.O.B.
Attendants
- CSCS: Pablo, Dario, Dino, Gianni
- CMS: Fabio Martinelli, Daniel Meister
- ATLAS: Gianfranco
- LHCb: Roland Bernet
- EGI: Gianfranco
Action items
This topic: LCGTier2
> WebHome >
MeetingsBoard > MeetingSwissGridOperations20151110
Topic revision: r16 - 2015-11-10 - GianfrancoSciacca