Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Prioritized TODO list for Phoenix until February 2017 In order to properly schedule the work derived from our last MeetingCHIPPCSCSFaceToFace20160901 meeting together with all the other activities, we need a complete list of all planned activities. Unplanned tasks (such as incidents) will always take precedence. The list will be checked every two weeks, including new incidents that may arise. Priority goes from top to bottom, together with the expected work for 1 person and the proposed deadline. %EDITTABLE{}% | *Ticket* | *Task* | *Duration* | *Deadline* | *Done* | | | Ask for HW offers for SNF funding request + derived analysis | weeks | 25 Oct | Yes | | | Resolve efficiency problems (with Scratch?) | ? | 10 Oct | Yes | | | Look at ARC + ATLAS queue configuration to undetstand what happened in April'16 with Scratch high utilization | hours | 10 Oct | needed? | | | Cross-check ATLAS and CMS efficiency plots with known issues and understand the differences | days | 10 Oct | Yes | | | Join all VOs and start familiarizing with their dashboards/logs | days | 31 Oct | Yes | | | Update the RoadMap wiki page | hours | 25 Oct | | | | Create full CSCS+VOs monitoring dashboard (incl. Hackathon) | weeks | 31 Oct | | | | Authentication on Kibana with Grid Certificates | hours | 31 Oct | | | | Identify relevant A/R metrics (and others) for each VO and track them | days | 31 Oct | identified | | | _____________________ TASK REVIEW _____________________ | | | | | 23008 | Discussion and implementation for VOBoxes | weeks | 30 Nov | Yes | | | Need to understand the impact of not imposing memory limits (e.g. swapping nodes?) | hours | Dec | LHConCRAY TODO| | | DCACHE Update | weeks | Dec | Yes | | 23515 | SLURM reports are broken | days | Dec | | | | Clean Monitoring on Wiki | days | Feb | | | 22368 | Complete Nagios check with info from the VO and publish in Wiki | days | Feb | | | | Puppetize cvmfs,argus | days | Feb | Partial | | | Foreman dismission | hours | Feb | | | | Update Doc on wiki | days | Feb | | | | Implement HA on Argus | days | Feb | | | | Puppetize Storage infrastructure | weeks | Feb | Partial | | | Add per-VO walltime usage to accounting plots | hours | Feb | Yes | | 24518 | Check EGI accounting | days | Feb | Yes | | | Implement nodehealthcheck | days | Feb | | | 23114 | Sudo rights on arc0[1,2,3] + arcbrisi | days | Feb | Yes | | | Finalize BDII lbcd->keepalive | days | Feb | Yes | ---++ Monitoring Dashboard details This section contains details about the common monitoring dashboard. For all: please add two metrics that you would like to see in the Dashboard. <span style="background-color: transparent;">ATLAS</span><br /><span style="background-color: transparent;"> </span> * <span style="background-color: transparent;">HC status for each queue: </span><span style="background-color: transparent;"> *# curl -sS "http://pandaserver.cern.ch:25085/cache/schedconfig/<PANDA_QUEUE>.pilot.json"|grep status* </span> <span style="background-color: transparent;"> * * <span style="background-color: transparent;">e.g.: </span><span style="background-color: transparent;"> *# curl -sS "http://pandaserver.cern.ch:25085/cache/schedconfig/CSCS-LCG2_MCORE.pilot.json"|grep status* </span> * <span style="background-color: transparent;">Logic AND for all queues is highly likely to result in misleading information. Instead, an alarm from each individual queue should be treated as an incident. We have 3 queues for Phoenix, 2 for Brisi (these should not trigger critical alarms during integration)</span> * <span style="background-color: transparent;">A time evolution showing the periods during which at least one queue is blacklisted is complementary information that is needed on top of the alarm (http://bigpanda.cern.ch/incidents/)</span> * Nr. of cores in each ARC status (available from e.g. gangliarc) vs pledged cores. Only nr. of running gives an incomplete picture in case of operations being not in an optimal state or compromised. All values are needed, as a function of time. A single alarm might be triggered on only nr. of running vs pledged (tbd) PABLO * *Real* Availability/Reliability metrics from the VO perspective, since the official ones (http://wlcg-sam.cern.ch/reports/2016/201611/wlcg/) were declared by the VO Reps, in our previous F2F meeting, as irrelevant. * Efficiency metrics (CPU/Walltime) within the cluster, in the last few days, for each VO. It would be great to have values from both the VO and the Cluster itself, but if we have those from the Cluster (from Slurm?) that's already quite something </span> <span style="color: #630000; font-size: 20px; background-color: #f6f6f6;">Readers' comments</span> %COMMENT{type="below"}%
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r15
<
r14
<
r13
<
r12
<
r11
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r15 - 2016-12-16
-
PabloFernandez
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback