Tags:
meeting
1
SwissGridOperationsMeeting
1
tag this topic
create new tag
view all tags
<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Swiss WLCG Operations Meeting on 2010-09-09 * *Date and time*: 2010/09/09 at 9:30 * *Place*: EVO, password: chipp * *External link / EVO*:http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MDMaM82a28DuDs929lD99D ---++ Agenda * Maintenance day report and site status * Status of Lustre problems * Status of new experiment software area * Derek: After fixing two local problems that were discovered during the installations by the central cms ops team, the CMS sw area is now ready. * Roland: LHCb sw area is ready. * Status of Phase D * Derek: We had a visit from Bull at PSI. When discussing different storage possibilities they also mentioned that they are using LSI solutions as the one suggested by CSCS in their slides. They seem to be price competitive and we now made a rather good experience with Bull in respect to a problem resolution with a DDN system that we had bought through them. Maybe one could get an offer for the HW from them (but maybe CSCS has an ETH Rahmenvertrag already for this kind of equipment). * Availability / Reliability values for July (79%) and August (47%). Numbers too low to be real, need to check numbers from VOs. * https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics * [[https://gridview.cern.ch/GRIDVIEW/sa/bin/same_graphs.php?XX=&Information=SiteDetail&DefVO=15&TestVO=-1&DurationOption=daily&LComponent=-2&NodeID=-1&TestID=-1&Hour1=0&StartDay=1&StartMonth=8&StartYear=2010&Hour2=23&EndDay=31&EndMonth=8&EndYear=2010<ier1Site=12&RelOrAvail=Availability&OnlyCritical=ON&SiteFullName=1&Report=0<ier2Site=12][Leo's link to GridView August]] * [[https://gridview.cern.ch/GRIDVIEW/sa/bin/same_graphs.php?XX=&Information=SiteDetail&DefVO=15&TestVO=-1&DurationOption=daily&LComponent=-2&NodeID=-1&TestID=-1&Hour1=0&StartDay=1&StartMonth=7&StartYear=2010&Hour2=23&EndDay=31&EndMonth=7&EndYear=2010<ier1Site=12&RelOrAvail=Availability&OnlyCritical=ON&SiteFullName=1&Report=0<ier2Site=12][Leo's link to GridView July]] * (Derek): Could CSCS give a short overview over the scheduling policies? Are there still the 100 (or so) reserved job slots for each experiment, so that we can guarantee a certain availability? * Review [[MeetingSwissWLCGOperations20100812#ActionItems][Action Items]] * CSCS: purchase hardware needed for implementing NFS setup * CSCS: open 3 tickets against Sun support; see ticket [[https://webrt.cscs.ch/Ticket/Display.html?id=7851][#7851]] * MG: check with VO to test CREAM CE and give status report; check availability of SAM tests for CREAM-CE * DF: check availability of SAM tests for CREAM-CE * [[https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CREAMCE&vo=cms&nodename=cream02.lcg.cscs.ch][classical CMS SAM test]] * [[http://dashb-cms-sam.cern.ch/dashboard/request.py/latestresultssmry?siteSelect3=T2&serviceTypeSelect3=all&sites=T2_CH_CSCS&services=CREAMCE&tests=3472&tests=3445&tests=3438&tests=2967&tests=2963&tests=2968&tests=2965&tests=2966&tests=2970&tests=2964&tests=2969&tests=1886&tests=2881&tests=3268&tests=3431&tests=3001&tests=3423&tests=3235&tests=3424&tests=3434&tests=3342&tests=3161&tests=3309&tests=3401&tests=3422&tests=3402&tests=3440&tests=3427&tests=3436&tests=3421&tests=3430&tests=3442&tests=3429&tests=3435&tests=3426&tests=3433&tests=3432&tests=3441&tests=1885&tests=2745&tests=2740&tests=2738&tests=2741&tests=2747&tests=2736&tests=2884&tests=2721&tests=2729&tests=2739&tests=2728&tests=2735&tests=2725&tests=2732&tests=2750&tests=2752&tests=2883&tests=2754&tests=2733&tests=2723&tests=2753&tests=2749&tests=2751&tests=2722&tests=2756&tests=2730&tests=2748&tests=2961&tests=2746&tests=1884&tests=1881&tests=2742&tests=2734&tests=2882&tests=3289&exitStatus=all][CMS dashboard view]] * No official directive to abandon lcg-CE in favor of CREAM * AOB ---++ Attendants * ATLAS: Marc, Gianfranco * CMS: Derek, Leo * LHCb: Ronald * CSCS: Peter, Pablo ---++ Minutes * During the maintenance we had a network problem with the SE head nodes that caused pools to go away for some time and some transfers were hanging, so we couldn't bring the site back until that was solved. * Lustre is giving trouble with the newest client version (1.8.4) so we decided to downgrade it back to 1.8.3, in a rolling downgrade. * When Atlas finishes copying the software to the NFS server (it's ongoing) we are going to mount it to all WNs and change the environment variables so that new jobs use the new area, but old jobs will still finish using the old one. This can cause some new incoming jobs not being able to find the SW that should be there, but there should not be too many and would not represent a problem. * PhaseD. We are going to ask for an offer to BULL to compare it with the one from IBM before placing the order. Also, we are still working with the numbers from last CHIPP Computing board, will be sent ASAP. * Availability / Reliability. Looks like ARC-CE could be the reason why we have such bad numbers. https://gridview.cern.ch/GRIDVIEW/sa/bin/same_graphs.php?XX=&Information=SiteDetail&DefVO=15&TestVO=-1&DurationOption=daily&LComponent=-2&NodeID=-1&TestID=-1&Hour1=0&StartDay=1&StartMonth=8&StartYear=2010&Hour2=23&EndDay=31&EndMonth=8&EndYear=2010<ier1Site=12&RelOrAvail=Availability&OnlyCritical=ON&SiteFullName=1&Report=0<ier2Site[]=12 And also http://lxarda16.cern.ch/dashboard/request.py/historicalsiteavailability?siteSelect3=T2&sites=T2_CH_CSCS&timeRange=individual&start=2010-08-01&end=2010-08-30 * Sheduling policies. We have the same reservations as before: 96 cores for Atlas, 96 cores for CMS, and 20 for LHCb. ---++ Action items * CSCS is to open a ticket into GGUS to investigate if the formula to calculate availability/reliability changed, maybe Arc01 problems made the whole site red. * CSCS is also asking Bull for an offer for the storage for PhaseD * CSCS is going to downgrade Lustre clients to 1.8.3
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r11
<
r10
<
r9
<
r8
<
r7
|
B
acklinks
|
V
iew topic
|
Ra
w
edit
|
M
ore topic actions
Topic revision: r11 - 2011-01-13
-
PabloFernandez
LCGTier2
Log In
(Topic)
LCGTier2 Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Users
Entry point / Contact
RoadMap
ATLAS Pages
CMS Pages
CMS User Howto
CHIPP CB
Outreach
Technical
Cluster details
Services
Hardware and OS
Tools & Tips
Monitoring
Logs
Maintenances
Meetings
Tests
Issues
Blog
Home
Site map
CmsTier3 web
LCGTier2 web
PhaseC web
Main web
Sandbox web
TWiki web
LCGTier2 Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Warning: Can't find topic "".""
Account
Log In
E
dit
A
ttach
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback