Swiss WLCG Operations Meeting on 2010-08-12

Agenda

  • Report on unscheduled downtime (FG)
  • Discussion about Experiment Software Area
  • Review Action Items
    • CMS has to enable SAM tests for CreamCE
    • Atlas has to check how CreamCE behaves and also enable SAM tests
  • AOB

Attendants

  • ATLAS: Gianfranco Sciacca, Marc Goulette, Sigve Haug, Szymon Gadomski
  • CMS: Derek Feichtinger
  • LHCb: Roland Bernet
  • CSCS: Fotis Georgatos, Peter Oettl

Minutes

  • Report on unscheduled downtime (FG)
    • Troublesome situation due to various Lustre instabilities
    • complexity/size of experiment-software aggravates Lustre risks
    • VO reps realized the issue and asked what we can do about it
    • CSCS has placed purchase orders for new controller hardware
    • CSCS recommend to verify AND rethink on the exp-software dirs
    • DF:
      • probably longest downtime
      • was not aware that there are 4-5 lustre fail overs / month
      • if it would be only scratch starting all over after a file system corruption would be easy
      • many sites had similar experiences; they went back to NFS
      • CSCS management (MDL and/or DU) has to push on Sun
        • Hardware is troublesome
        • Support is not delivered
    • SH:
      • Lustre at Tier-3 since April
      • Experiment software remained on NFS
      • MDS crashes (no failover node)
    • See also ticket #7851

  • Discussion about Experiment Software Area
    • In short: go back to PhaseB implementation; DRBD is well tested
    • Proposal: start from scratch so we have a known state and a clean reduced software area
      • VOs agree
      • SH: clarify with Andreij if ARC could use gLite software area
    • VOs asked for more than 1 TB of total diskspace
    • Offered solution:
      • Setup CE + WN to start software installation
      • no interruption needed; switch software area from Lustre to NFS after installation is finished

  • Review Action Items:
    • VO Reps will check with their contacts what is possible to test
    • RB: LHCb is running fine on CREAM

  • AOB
    • SH: many sites in CH use Lustre; would be useful to gather experiences/knowledge
      • PO: HPC Forum about Parallel File Systems in October

Action items

  • CSCS: purchase hardware needed for implementing NFS setup
  • CSCS: open 3 tickets against Sun support; see ticket #7851
  • MG: check with VO to test CREAM CE and give status report; check availability of SAM tests for CREAM-CE
  • DF: check availability of SAM tests for CREAM-CE
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2010-08-13 - PeterOettl
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback