Tags:
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Swiss WLCG Operations Meeting on 2010-08-12 * *Date and time*: 2010/08/12 at 9:30 * *Place*: EVO, password: chipp * *External link / EVO*: http://evo.caltech.edu/evoNext/koala.jnlp?meeting=vsvivIeieeIMI9a8aDItas ---++ Agenda * Report on unscheduled downtime (FG) * Discussion about Experiment Software Area * ExperimentSofwareAreaProposal * Review [[MeetingSwissWLCGOperations20100729#ActionItems][Action Items]] * CMS has to enable SAM tests for !CreamCE * Atlas has to check how !CreamCE behaves and also enable SAM tests * AOB ---++ Attendants * ATLAS: Gianfranco Sciacca, Marc Goulette, Sigve Haug, Szymon Gadomski * CMS: Derek Feichtinger * LHCb: Roland Bernet * CSCS: Fotis Georgatos, Peter Oettl ---++ Minutes * Report on unscheduled downtime (FG) * Troublesome situation due to various Lustre instabilities * complexity/size of experiment-software aggravates Lustre risks * VO reps realized the issue and asked what we can do about it * CSCS has placed purchase orders for new controller hardware * CSCS recommend to verify AND rethink on the exp-software dirs * DF: * probably longest emergency downtime we ever experienced * VO-contacts were not aware that there are 4-5 lustre fail overs / month * if it only the Lustre shared scratch was affected, we could just wipe and rebuild it. All running jobs would be lost, but the downtime would only be some hours. Rebuilding the SW area can take days and also involves work by central operations people. So, we should separate the exp SW from the scratch (Also, Lustre is not ideal for storing huge amounts of tiny files as found in the exp SW area.) * many sites had similar experiences; they went back to NFS * CSCS management (MDL and/or DU) has to push on Sun. The system is severely affecting our operations and consuming excessive admin time to keep stable. * Hardware is troublesome * No adequate support is not delivered * SH: * Lustre at Tier-3 since April * Experiment software remained on NFS * MDS crashes (no failover node) * See also ticket [[https://webrt.cscs.ch/Ticket/Display.html?id=7851][#7851]] * Discussion about Experiment Software Area * In short: go back to !PhaseB implementation; DRBD is well tested * Proposal: start from scratch so we have a known state and a clean reduced software area * VOs agree * SH: clarify with Andreij if ARC could use gLite software area * VOs asked for more than 1 TB of total diskspace * Offered solution: * Setup CE + WN to start software installation * no interruption needed; switch software area from Lustre to NFS after installation is finished * Review Action Items: * VO Reps will check with their contacts what is possible to test * RB: LHCb is running fine on CREAM * AOB * SH: many sites in CH use Lustre; would be useful to gather experiences/knowledge * PO: [[http://www.hpc-ch.org/forums/index.html][HPC Forum]] about Parallel File Systems in October #ActionItems ---++ Action items * CSCS: purchase hardware needed for implementing NFS setup * CSCS: open 3 tickets against Sun support; see ticket [[https://webrt.cscs.ch/Ticket/Display.html?id=7851][#7851]] * MG: check with VO to test CREAM CE and give status report; check availability of SAM tests for CREAM-CE * DF: check availability of SAM tests for CREAM-CE
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r5
<
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r5 - 2011-01-13
-
PabloFernandez
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback