Working on the PSI CMS Tier-3

The PSI Tier-3 encompasses the following services/machines

User Interface: This is a fully operational LCG UI. It enables you to
- login from outside, so it is the standard login node
- Run local jobs on the Tier-3 using the local batch system (SGE) and CRAB as a front end to it (this still needs testing).
- Interact with the LCG Grid. Submit jobs, contact storage elements, etc.
- interact with AFS
Local batch farm:
- Allows you to run local jobs on the Tier-3 worker nodes from the User Interface
Storage Element: This is a fully equipped LCG storage element running a dCache. It allows you to
- store/read files from local jobs
- send files to it from your jobs running in the Grid, or have your Grid jobs access files at the PSI SE. This also will give you an extra amount of user space over the space that is available in your Tier-2 associated analysis groups.
NFS Server
- Hosts user and software areas
- Note: The dCache is the main semi-permanent (RAID) large storage space available to users. It is currently not foreseen to have large local disk space for longer time user storage, even though we can enlarge the NFS (but NFS is limited and we would need another solution if something like a parallel file system would be required).
PhEDEx: Our Storage will be coupled to the PhEDEx CMS Data Management System, so we can order data sets to our center from the T2 and the T1s.
Network connectivity: PSI currently has a 1Gb/s uplink. This we can increase with time. We assume a inhomogeneous network load with occasional peaks. For large transfers people should use asynchronous transfer services which we will try to provide.

Note: As was originally discussed, there is no computing element, so you cannot submit grid jobs to the Tier-3. You need to submit them from the local user interface. This allows us to stay more flexible in respect to configuration, since we want the Tier-3 to adapt best to the users' analysis needs. This is not so easily possible for a Tier-2, where certain configurations must be guaranteed towards the full collaboration. Having only a minimal set of Grid services also cuts down the administration effort significantly.

Login Accounts: two possibilities

All users will require a PSI AFS login account. Many users already have one, and this is coupled to a PSI AFS area.
Users will only get local accounts, but these accounts will match their CMS hypernews name which is used widely in CMS as an identifier (e.g. this name is used in the path for user's storage areas on the different SEs)

Comments

Please put your comments here and provide your name

Comment on local storage

Urs Langenegger: (filled in from email by Derek)
I may have missed a point, but I think that it would be extremely useful to have in addition to the fully equipped LCG storage element running a dcache storage some large normal diskstorage, where one can put (large) amounts of rootfiles. Being limited by dcache restrictions could be a nightmare there. If something like this is foreseen, then I am happy. We do need the dcache, of course, because writing directly to the T3 will be extremely valuable.

Derek: Large "normal" disk storage was not really planned. We can at some point (next stage) add a parallel file system. NFS does not scale, so we would need to think about GPFS (used at PSI, but now going commercial. Lustre is another possibility, but only now is becoming a real option. A CERN team is on that, and there is also a storage manager (StoRM) that can work on top of a file system. But dcache has many good points in terms of handling large amounts of storage nicely. Also, all CMS software is written with this system in mind as one of the main target storage systems. We need to make it better accessible on the local scale, though, to make file handling easier. But this we can work on.

We should collect experience from users as soon as they can test the system. Then we can decide on how to enlarge it best to suit their working styles. That's also why we begin this year with this moderately sized Tier-2 setup.

We need to improve the dialogue between the users and the administrators for both the T2 and the T3. Often I feel that people build solutions around problems, which people with technical know how could solve better and faster, if they only knew about them (e.g. when a user does not know that a certain command is available, or for scripting certain tasks). But this will hopefully increase, once we have an active local user community. For the Tier-3 we are trying to build a much more functional Twiki structure to help in communication and information retrieval.

-- DerekFeichtinger - 21 Aug 2008

Topic revision: r5 - 2013-09-04 - FabioMartinelli

CmsTier3

User Pages
Main Page
Policies

Physics Groups
Steering Board Meetings

Admin Pages
AdminArea
Cluster Specs