Tags:
create new tag
view all tags

CHIPP Phoenix Cluster Workshop June 11,th 2006, CSCS

Tier2 Status

  • A second 'Schulleitungssitzung' needs to be had, since the cooling infrastructure installation is not secured yet in terms of funding. Our proposal was evaluated positively, and we don't have to do anything new again.
  • Original cluster (old) and new cluster are working 'as one'. The nodes are not identical, but the software doesn't cares. The nodes are managed by cfengine. The hardware is reasonably reliable. Tom: 'it's not a complicated installation'.
  • ZFS is resilient with the machines we have so far.
  • Personnel at CSCS: August will be difficult, and until new person comes we have to 'scratch' the resources together. There are 2FTEs available for the Phoenix cluster.

Tier3 Status

Geneva

  • 26TB, 108cores in workers and login PCs
  • atlas-ui01..03, grid00 and grid02 have 'double identities' - are also part of CERN network
  • NorduGrid as scheduler
  • Runs mostly ATLAS MC
  • Higher priority for Swiss Atlas members
  • Corrupted X4500 now because of a crash.
  • Plan for new system in Autumn - 2 more Thumpers and more cores (176)
  • System may be used also for other local activities (Neutrino expts)
  • Assumptions may not apply in future! as data volumes will increase significantly from MC to Production (Factor 200)

Questions:

  • CERN Link? Why?
    • Idea is to have data directly from CERN more efficiently. (ui usage, AFS)
    • The computing model does not foresee it, that has to work as well of course but want to make use of physical locality too

Advice:

  • Run X4500 with Solaris not SLC4, which crashes consistently with software raid.

Bern

  • Dedicated cluster, 36 cores, 10TB. SLC4/SuSE
  • UBELIX cluster from informatics department, 506 cores (shared), Gentoo
  • Plans - next year for 50cores and ~40TB space
  • ARC/Torque/NFS has scaling problems

Questions:

  • AFS? No
  • Grow in size after 2008? Not necessarily
  • How do you mount? NFS-based or scp based (two kinds in ARC)

Tier3 LHCb

  • Zurich HEP cluster
    • openSUSE 32, by the end of the year the size is 20 cores, 10TB of disk (more than needed)
  • Matterhorn cluster (university), shared, SuSE 64 (512cores) using 'spare cycles' - university offers to host buyed hardware, and even to double it (match funds)
  • Lausanne - no info

  • openSUSE is an issue - does not work

Tier3 CMS in ZH

  • Money will be available
  • 60TB of storage in first year (3Thumpers)
  • 100kSI 15WNs, quadcore
  • Set this up by PSI, ETH pays hardware

  • There 14 cores and 40TB intel on the Hoenggerberg, SLC, CMS is installed on it

Experiment Status

ATLAS

  • 2 Clusters in Geneva
  • 1 Cluster in Bern
  • Tier2 Phoenix

  • Geneva
    • Using NG in Geneva
    • .xrsl job descriptions in nordugrid
    • Grid Pilot, Ganga, pathena - did not try them all yet

  • Memory requirements per job: changes to 2GB per core (since january)

LHCb

  • DIRAC: some python issues with 64
  • new CERN certs do not work with dCache
  • VOMS roles are not set correctly yet on all sites
  • There may be a nordugrid interface of dirac in the future, is being evaluated

Data Handling

CMS (Derek)

  • Datasets that are made up of several files are managed as a unit
  • Coordinated copying of files through Phedex
  • Dataset discovery through a dedicated db
  • No file catalog anymore (trivially rule-based)
  • No fine-grained security needs

--+++ ATLAS (Sigve)

  • Don Quichote DQ2 system used as a client
  • Central DB takes care of datasets (CERN)
  • Each Tier1 'cloud' has its own data catalog (FZK has a file catalog where CSCS data is also stored)

LHCb (Roland)

  • There is a webinterface to a bookkeeping catalog
  • 2 copies of raw data (CERN and one other Tier1 for each data)
  • The data is then reconstructed at the Tier1
  • If CSCS is to be part of it, it shall receive all data (150TB per year)
  • twice a year 'additional stripping', so a max total of 4 versions are produced per year.
  • data storage: keep latest and next-to-latest set of stripped DST on disk

  • LHCb is using currently Tier1 only for analysis
  • Next step: set up analysis infrastucture. no hurry as the LHCb computing group has enough problems with the Tier1s already...

dCache (Alessandro)

Discussions

User interactions

  • Both atlas and cms want little end-user interaction but have a human layer built in to perform data transfer operations (subscriptions, etc)

Deletions

  • How to keep the deletion rate at the same as the incoming rate
  • Deletion also happens by the support person usually
  • Dcache can delete a whole directory quickly, so cms's trivial file catalog approach works well here
  • LHCb has only single operations for delete, no bulk delete

Output

  • How to make sure that the results of the Swiss users are properly stored
  • Something at CSCS? Make sure it goes to a safe place?
  • --> This is an open point, need to architect something

Transfer Exercises

Introduction, Setting the Stage

  • There are several layers in the data transfer stack:

Experiment (Phedex, DDM, Dirac) async
FTS
SRM
GridFTP Others synchronous
TCP inherently async

Each layer polls the layer beneath (except the sync layer). This makes the whole stack difficult to manage and to debug.

  • TCP buffer sizes are set by default to values suitable for LAN transfers, not for WAN. There the buffer sizes need to be much larger.
  • Congestion prevention algorithms assure that people 'behave nicely' on the net. I.e. drop to half the packet size if packets are lost, go to higher number of packets only if they get through. If you don't do this, congestions occur more often.

-- PeterKunszt - 11 Jun 2007

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2007-06-11 - PeterKunszt
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback