Tags:
meeting1Add my vote for this tag SwissGridOperationsMeeting1Add my vote for this tag create new tag
view all tags

Swiss WLCG Operations Meeting on 2011-01-13

Agenda

  • CSCS Status
    • NFS temporal relocation. We again had a problem with NFS and we had to move it to two new Thors. This is a temporal solution, we need the space they provide, we're still looking for alternatives.
  • Other Sites Status
    • T3_CH_PSI
      • After the maintenance shutdown of Jan 8th there is one older Sun Fire X4100 with a broken service processor and yet another defect Thor Seagate disk (funny enough, the failing disk is one of the unused spare disks)
      • ran stably over the vacation period. In the first working week we had a dcache failure based on the posgresql DB going into a strange state. The effect was that all connections just kept hanging after the initial handshake. Ok after restart of DB.
      • Fabio Martinelli will join the PSI Scientific Computing group on Feb 1st. He will take over the T3 administration as his main responsibility
    • UNIBE-LHEP:
      • 1) migrated to NGI_CH in GOCDB. Waiting to begin the certification process
      • 2) ce.lhep.unibe.ch now tested within the pre-production instance of the German Nagios: https://rocmon-fzk.gridka.de/nagios/cgi-bin/status.cgi?hostgroup=all&style=detail
      • 3) DPM hardware delivered and installation has started. Head node running glite-DPM_mysql and bdii-site, 3xDisk servers running glite-DPM_disk, all on SLC5 * (Derek) We need to keep better accounting for reports to funding agencies (e.g. we should add metrics for monitoring the filled SE space per experiment). Also see ResourceAccounting2010
    • UNIGE-DPNC
      • We start to see disk failures. We have 8 Thumpers (i.e. 384 disks) installed in years 2007 to 2009. One disk failed in December. There was no data loss, the redundancy has worked, the disk was replaced. Aniother disk is getting slow now. We are making a stock of spare disks.
      • Since always we have O(1%) failure rate of the rfcp (copy from our DPM SE to the batch worker nodes). Since this week we have a suspect. The firewall on Solaris file servers is "stateful", which can sometimes block legitimate connections. We will test this hypothesis during the next few days.
      • Regular alerts about security vulnerabilities of the SLC come from the NDGF. We react to them quickly, checking our machines and closing holes. So far no trouble.
      • All batch and login machines are now SLC5.
      • The hardware delivered in 2010 is still not powered up. It is now waiting for the people to have time.
      • The documentation wiki pages have been moved. They are still at CERN, but no longer under ATLAS protection. This was necessary because we have non-ATLAS users.
      • The operation is quiet orherwise. We stay in the green league. All alarms so far have been false.
  • New Twiki Index (left bar)
  • Firewall and Thumper migration delay. The firewall setup is giving trouble
  • PhaseD upgrade status. We are still trying to have storage working by the end of January.

Attendants

  • CSCS: Peter, Pablo
  • Atlas: Marc
  • CMS: Derek, Leo
  • LHCb: Roland

Minutes

News from Gianfranco via email:

  • The most interesting news is the request from ATLAS to merge MCDISK into DATADISK with immediate effect. The procedure should be as follows: move all free space (with a couple of TBs of margin) from MCDISK to DATADISK. The cloud DDM will then move all files from MCDISK to DATADISK and will eventually advise when it is OK to discontinue the MCDISK token and move all its space to DATADISK. It is foreseen that the whole procedure be complete by beginning of February.
  • Coming up very soon: "Networking Requirements for LHC Data Analysis in Germany": http://indico.desy.de/conferenceDisplay.py?confId=3823. 1 person per tier-2 associated with Germany is formally invited.
    • Looks like time is very tight to go, but Switch is sending someone. It would be nice to know who.
Other minutes:
  • Marc told us about their last Gridka Tier1-Tier2 meeting. There are some files attached. You can find the complete slides here: http://indico.cern.ch/conferenceDisplay.py?confId=118083
  • CSCS will wait for a formal request to move the space from MCDISK to DATADISK, to avoid data comming to MCDISK to find a disk-full error
  • We will wait a week for feedback on the left bar of the Twiki. There is still work to do:
    • Clean up and review the unknown-visitor pages, also avoiding private links on the index
    • Fill up the Service cards (and HW cards) with complete and up-to-date procedures
    • We need to agree on a way to mark obsolete pages. Maybe with a red big message in the beginning of the page, also maybe on the twiki name (nice for searches)
  • Marc told us Atlas is stopping their services from next Sunday to Tuesday. If we need to do maintenance in Atlas-only services that's a good date.

Action items

  • Pablo will make the Inventory link on the left bar of the twiki visible only when the user is logged in.
  • Pablo will also make a doodle link to agree on a date to meet face-to-face next month.
  • CSCS will introduce a used-space metric in Ganglia for dCache (in adition to the free-space metric that is already there)
Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng CloudStatusProduction.png r1 manage 232.0 K 2011-01-13 - 08:36 MarcGoulette ATLAS-Cloud-Info
PDFpdf DDMreport20110112FZKCloudMonthly.pdf r1 manage 238.3 K 2011-01-13 - 08:36 MarcGoulette ATLAS-Cloud-Info
PDFpdf T2overview-20110112_CloudMeeting.pdf r1 manage 99.0 K 2011-01-13 - 08:36 MarcGoulette ATLAS-Cloud-Info
Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r8 - 2011-01-13 - RolandBernet
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback