Tags:
view all tags
Swiss WLCG Operations Meeting on 2010-08-12
Date and time
: 2010/08/12 at 9:30
Place
: EVO, password: chipp
External link / EVO
:
http://evo.caltech.edu/evoNext/koala.jnlp?meeting=vsvivIeieeIMI9a8aDItas
Agenda
Report on unscheduled downtime (FG)
Discussion about Experiment Software Area
ExperimentSofwareAreaProposal
Review
Action Items
CMS has to enable SAM tests for CreamCE
Atlas has to check how CreamCE behaves and also enable SAM tests
AOB
Attendants
ATLAS: Gianfranco Sciacca, Marc Goulette, Sigve Haug, Szymon Gadomski
CMS: Derek Feichtinger
LHCb: Roland Bernet
CSCS: Fotis Georgatos, Peter Oettl
Minutes
Report on unscheduled downtime (FG)
Troublesome situation due to various Lustre instabilities
complexity/size of experiment-software aggravates Lustre risks
VO reps realized the issue and asked what we can do about it
CSCS has placed purchase orders for new controller hardware
CSCS recommend to verify AND rethink on the exp-software dirs
DF:
probably longest downtime
was not aware that there are 4-5 lustre fail overs / month
if it would be only scratch starting all over after a file system corruption would be easy
many sites had similar experiences; they went back to NFS
CSCS management (MDL and/or DU) has to push on Sun
Hardware is troublesome
Support is not delivered
SH:
Lustre at Tier-3 since April
Experiment software remained on NFS
MDS crashes (no failover node)
See also ticket
#7851
Discussion about Experiment Software Area
In short: go back to PhaseB implementation; DRBD is well tested
Proposal: start from scratch so we have a known state and a clean reduced software area
VOs agree
SH: clarify with Andreij if ARC could use gLite software area
VOs asked for more than 1 TB of total diskspace
Offered solution:
Setup CE + WN to start software installation
no interruption needed; switch software area from Lustre to NFS after installation is finished
Review Action Items:
VO Reps will check with their contacts what is possible to test
RB: LHCb is running fine on CREAM
AOB
SH: many sites in CH use Lustre; would be useful to gather experiences/knowledge
PO:
HPC Forum
about Parallel File Systems in October
Action items
CSCS: purchase hardware needed for implementing NFS setup
CSCS: open 3 tickets against Sun support; see ticket
#7851
MG: check with VO to test CREAM CE and give status report; check availability of SAM tests for CREAM-CE
DF: check availability of SAM tests for CREAM-CE
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r5
<
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
R
aw View
|
Raw edit
|
More topic actions...
Topic revision: r3 - 2010-08-13
-
PeterOettl
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
Edit
Attach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback