<!-- keep this as a security measure: * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.LCGAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.LCGAdminGroup #uncomment this if you want the page only be viewable by the internal people #* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.LCGAdminGroup --> ---+ Swiss WLCG Operations Meeting on 2010-09-09 * *Date and time*: 2010/09/09 at 9:30 * *Place*: EVO, password: chipp * *External link / EVO*:http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MDMaM82a28DuDs929lD99D ---++ Agenda * Maintenance day report and site status * Status of Lustre problems * Status of new experiment software area * Derek: After fixing two local problems that were discovered during the installations by the central cms ops team, the CMS sw area is now ready. * Roland: LHCb sw area is ready. * Status of Phase D * Derek: We had a visit from Bull at PSI. When discussing different storage possibilities they also mentioned that they are using LSI solutions as the one suggested by CSCS in their slides. They seem to be price competitive and we now made a rather good experience with Bull in respect to a problem resolution with a DDN system that we had bought through them. Maybe one could get an offer for the HW from them (but maybe CSCS has an ETH Rahmenvertrag already for this kind of equipment). * Availability / Reliability values for July (79%) and August (47%). Numbers too low to be real, need to check numbers from VOs. * https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics * [[https://gridview.cern.ch/GRIDVIEW/sa/bin/same_graphs.php?XX=&Information=SiteDetail&DefVO=15&TestVO=-1&DurationOption=daily&LComponent=-2&NodeID=-1&TestID=-1&Hour1=0&StartDay=1&StartMonth=8&StartYear=2010&Hour2=23&EndDay=31&EndMonth=8&EndYear=2010<ier1Site=12&RelOrAvail=Availability&OnlyCritical=ON&SiteFullName=1&Report=0<ier2Site=12][Leo's link to GridView August]] * [[https://gridview.cern.ch/GRIDVIEW/sa/bin/same_graphs.php?XX=&Information=SiteDetail&DefVO=15&TestVO=-1&DurationOption=daily&LComponent=-2&NodeID=-1&TestID=-1&Hour1=0&StartDay=1&StartMonth=7&StartYear=2010&Hour2=23&EndDay=31&EndMonth=7&EndYear=2010<ier1Site=12&RelOrAvail=Availability&OnlyCritical=ON&SiteFullName=1&Report=0<ier2Site=12][Leo's link to GridView July]] * (Derek): Could CSCS give a short overview over the scheduling policies? Are there still the 100 (or so) reserved job slots for each experiment, so that we can guarantee a certain availability? * Review [[MeetingSwissWLCGOperations20100812#ActionItems][Action Items]] * CSCS: purchase hardware needed for implementing NFS setup * CSCS: open 3 tickets against Sun support; see ticket [[https://webrt.cscs.ch/Ticket/Display.html?id=7851][#7851]] * MG: check with VO to test CREAM CE and give status report; check availability of SAM tests for CREAM-CE * DF: check availability of SAM tests for CREAM-CE * [[https://lcg-sam.cern.ch:8443/sam/sam.py?funct=ShowHistory&sensors=CREAMCE&vo=cms&nodename=cream02.lcg.cscs.ch][classical CMS SAM test]] * [[http://dashb-cms-sam.cern.ch/dashboard/request.py/latestresultssmry?siteSelect3=T2&serviceTypeSelect3=all&sites=T2_CH_CSCS&services=CREAMCE&tests=3472&tests=3445&tests=3438&tests=2967&tests=2963&tests=2968&tests=2965&tests=2966&tests=2970&tests=2964&tests=2969&tests=1886&tests=2881&tests=3268&tests=3431&tests=3001&tests=3423&tests=3235&tests=3424&tests=3434&tests=3342&tests=3161&tests=3309&tests=3401&tests=3422&tests=3402&tests=3440&tests=3427&tests=3436&tests=3421&tests=3430&tests=3442&tests=3429&tests=3435&tests=3426&tests=3433&tests=3432&tests=3441&tests=1885&tests=2745&tests=2740&tests=2738&tests=2741&tests=2747&tests=2736&tests=2884&tests=2721&tests=2729&tests=2739&tests=2728&tests=2735&tests=2725&tests=2732&tests=2750&tests=2752&tests=2883&tests=2754&tests=2733&tests=2723&tests=2753&tests=2749&tests=2751&tests=2722&tests=2756&tests=2730&tests=2748&tests=2961&tests=2746&tests=1884&tests=1881&tests=2742&tests=2734&tests=2882&tests=3289&exitStatus=all][CMS dashboard view]] * No official directive to abandon lcg-CE in favor of CREAM * AOB ---++ Attendants * ATLAS: Marc, Gianfranco * CMS: Derek, Leo * LHCb: Ronald * CSCS: Peter, Pablo ---++ Minutes * During the maintenance we had a network problem with the SE head nodes that caused pools to go away for some time and some transfers were hanging, so we couldn't bring the site back until that was solved. * Lustre is giving trouble with the newest client version (1.8.4) so we decided to downgrade it back to 1.8.3, in a rolling downgrade. * When Atlas finishes copying the software to the NFS server (it's ongoing) we are going to mount it to all WNs and change the environment variables so that new jobs use the new area, but old jobs will still finish using the old one. This can cause some new incoming jobs not being able to find the SW that should be there, but there should not be too many and would not represent a problem. * PhaseD. We are going to ask for an offer to BULL to compare it with the one from IBM before placing the order. Also, we are still working with the numbers from last CHIPP Computing board, will be sent ASAP. * Availability / Reliability. Looks like ARC-CE could be the reason why we have such bad numbers. https://gridview.cern.ch/GRIDVIEW/sa/bin/same_graphs.php?XX=&Information=SiteDetail&DefVO=15&TestVO=-1&DurationOption=daily&LComponent=-2&NodeID=-1&TestID=-1&Hour1=0&StartDay=1&StartMonth=8&StartYear=2010&Hour2=23&EndDay=31&EndMonth=8&EndYear=2010<ier1Site=12&RelOrAvail=Availability&OnlyCritical=ON&SiteFullName=1&Report=0<ier2Site[]=12 And also http://lxarda16.cern.ch/dashboard/request.py/historicalsiteavailability?siteSelect3=T2&sites=T2_CH_CSCS&timeRange=individual&start=2010-08-01&end=2010-08-30 * Sheduling policies. We have the same reservations as before: 96 cores for Atlas, 96 cores for CMS, and 20 for LHCb. ---++ Action items * CSCS is to open a ticket into GGUS to investigate if the formula to calculate availability/reliability changed, maybe Arc01 problems made the whole site red. * CSCS is also asking Bull for an offer for the storage for PhaseD * CSCS is going to downgrade Lustre clients to 1.8.3
This topic: LCGTier2
>
WebHome
>
MeetingsBoard
>
SwissWLCGOperationsMeeting20100909
Topic revision: r11 - 2011-01-13 - PabloFernandez
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback