40% of ATLAS/CH WT, but 67% CPUtime in May (all jobs) - CSCS shows >60% FAILED WT [1] (most of them are "SIGTERM from the batch system" and "error in copying the file from job workdir to local SE" - will open a rt ticket to follow up on this)
DPM head node migration to SLC6 and ATLAS storage dumps still on hold
HammerCloud report [2]
UNIBE-LHEP online >92% (last month). Better than previous month. Still room for improvement, but not too big impact since interruptions are not long enough to cause the site to drain.
UNIBE-ID >99%
UNIBE-LHEP_CLOUD* <90% (lost hearbeat from pilot: some intermittent network instabilities)
Accounting numbers (from ATLAS dashboard) from last month (May 2016)
CPU h: 1194137
WC h: 1358408
UNIBE-ID
Smooth operation in general; no outages
Mitigation has been setup for high fail rate for ATALAS jobs (SIGKILL due to h_vmem violation) by increasing multiplier in submit-job-sge => decrease of fail rate but more resource waste.
Medium-term goal: Move from OG-SGE to Slurm (essentialy a matter of user acceptance, not a technical issue)
As previously announced, 2-day downtime next week: IB-Recabiling (8 => 16 spine switches); provisioning of 2160 cores (Broadwell)
Accounting number (from scheduler) from last month for ATLAS:
CPU h: 135'276
WC h: 108'001
UNIGE
Xxx
Accounting numbers (from scheduler) from last month
NGI_CH
WLCG plans to retire the requirement for sites to run a site-bdii. EGI sees it differently. Long ongoing discussion, including a WLCG Task Force assigned to this. Stay tuned, but don't hold your breath : -)
Heads up: current funding for the minimal NGI_CH operation layer (10%FTE) will end by end of year. Will need to identify a solution. Also open from end of the year are the EGI fee (hopefully it will go on Swing) and the certificates (~30kCHF including ~10% FTE for operation). Now not only strictly CHIPP uses certificates.
NGI-CH Open Tickets review
120405 for CSCS (LHCb) Red: "very urgent", last update on 2016-05-11. Reply awaited from site.
117899 for UNIBE-LHEP (ATLAS) On hold (ATLAS request- storage dumps)