Swiss Grid Operations Meeting on 2016-11-11 at 14:00
Site status
CSCS
- Xxx
- Accounting numbers (from scheduler) from last month
PSI
UNIBE-LHEP
- Routine operation up to shutdown for CVE-2016-5195.
- Downtime was ill-declared (by me) so the site was not taken offline and this had an impact on the measured efficiency (blackhole too).
- Infrastructure intervention during and following the downtime, running at reduced capacity for several days.
- Firewall issue for ce04 (cloud) following the downtime: unavailable for a couple of weeks
- Preparing for campus-wide power cut on 29-30 Nov.
http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteviewhistory?columnid=562#time=custom&start_date=2016-10-01&end_date=2016-10-31&values=false&spline=false&debug=false&resample=false&sites=multiple&clouds=all&site=ANALY_CSCS,ANALY_UNIBE-LHEP,ANALY_UNIBE-LHEP-UBELIX,CSCS-LCG2,CSCS-LCG2_MCORE,UNIBE-LHEP,UNIBE-LHEP_MCORE,UNIBE-LHEP-UBELIX,UNIBE-LHEP-UBELIX_MCORE,UNIBE-LHEP_CLOUD,UNIBE-LHEP_CLOUD_MCORE
- Accounting numbers (from scheduler) from last month (core-hours October 2016):
ATLAS: 933809; T2K: 10227; OPS: 31
- Accounting numbers from ATLAS dashboard from last month (core-hours October 2016) [1],[2]:
CSCS / UNIBE 57% / 43% - 1575861 / 1185039 (reduced capacity at UNIBE after downtime)
- Efficiency WT ok/fail [3]:
CSCS/UNIBE 69.71/53.58 (bad downtime for UNIBE)
CSCS/UNIBE 0.53/0.72 (CSCS recovers following downtime and GPFS fix):
[1]
http://dashb-atlas-job.cern.ch/dashboard/request.py/consumptions_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All%20Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2016-10-01&end=2016-10-31&timeRange=daily&granularity=8%20Hours&generic=0&sortBy=0&series=All&type=ewa
[2]
http://dashb-atlas-job.cern.ch/dashboard/request.py/consumptions_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All%20Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2016-10-01&end=2016-10-31&timeRange=daily&granularity=8%20Hours&generic=0&sortBy=0&series=All&type=wab
[3]
http://dashb-atlas-job.cern.ch/dashboard/request.py/terminatedjobsstatus_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All%20Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2016-10-01&end=2016-10-31&timeRange=daily&sortBy=0&granularity=8%20Hours&generic=0&series=All&type=ebwc
[4]
http://dashb-atlas-job.cern.ch/dashboard/request.py/efficiency_individual?sites=CSCS-LCG2&sites=UNIBE-LHEP&sitesCat=All%20Countries&resourcetype=All&sitesSort=2&sitesCatSort=0&start=2016-10-01&end=2016-10-31&timeRange=daily&granularity=8%20Hours&generic=0&sortBy=0&series=All&type=eal
UNIBE-ID
UNIGE
- Xxx
- Accounting numbers (from scheduler) from last month
NGI_CH
- Funding for NGI_CH liaiason role (operation manager, security officer, etc) runs out by end of year.
- Possible scenario: 15k/y provided by the CHIPP CB institutes. Bern via LHEP or the Scientific IT Support unit to provide the service (as now).
- Any alternative proposal: please reply to e-mail thread.
- NGI-CH Open Tickets review:
https://ggus.eu/index.php?mode=ticket_search&show_columns_check%5B%5D=TICKET_TYPE&show_columns_check%5B%5D=AFFECTED_VO&show_columns_check%5B%5D=AFFECTED_SITE&show_columns_check%5B%5D=PRIORITY&show_columns_check%5B%5D=RESPONSIBLE_UNIT&show_columns_check%5B%5D=STATUS&show_columns_check%5B%5D=DATE_OF_CHANGE&show_columns_check%5B%5D=SHORT_DESCRIPTION&ticket_id=&supportunit=NGI_CH&su_hierarchy=0&vo=&user=&keyword=&involvedsupporter=&assignedto=&affectedsite=&specattrib=none&status=open&priority=&typeofproblem=&ticket_category=all&mouarea=&date_type=creation+date&tf_radio=1&timeframe=any&from_date=06+May+2014&to_date=07+May+2014&untouched_date=&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO%21
AFS related:
124818 (PSI) in progress,
124815 (UZH) contacted UZH to check if site obsolete-> could deactivate it in GOCDB
ATLAS CSCS:
124719 (squid down) needs a restart on atlas01
ATLAS UNIBE:
124518 (higer than normal failure rate at Ubelix). Main cause of failure fixed, dealing with some job timeouts now
ATLAS UNIBE:
117899 (storage dumps) on hold
CMS CSCS:
124714 (jobs not running) fixed?
Accounting: CSCS:
123765 (cream accounting): needs action from CSCS - UNIBE:
124320 (not publishing) actions carried out, must check back the status
Other topics
Next meeting date:
A.O.B.
Attendants
- CSCS:
- CMS: Fabio
- ATLAS: Gianfranco: apologies
- LHCb:
- EGI:
Action items