Information for CMS Users of the PHOENIX cluster

Where can I obtain a Grid certificate?

If you have a CERN NICE account, it is best to get a certificate from CERN, just by filling out a form on the new CERN CA web site. If not you can get a certificate form the Swiss Grid CA which is operated by SWITCH.

For people without CERN accounts: SWITCH is acting as the Swiss Grid CA (Certification Authority). They are testing this service currently in a pilot phase. In order to obtain a Grid certificate, got to this page on their site: https://www.switch.ch/pki/swisssign/user-certificate.html.

Follow all the steps exactly as indicated on this page. Print out the registration document (best from the link in the return mail you receive upon submission of the web forms, because most entries have allready been filled in there).
You have to give your full name exactly as it appears in your passport or ID
Sign the form only at 6a), not at points b) and c)

You have to bring these documents to the responsible security manager of your insitute. He will sign the registration document and send it on.

Institute responsible security manager:

For PSI: Tobias Marx (WHGA/U134)

How can I get the certificate out of the browser and into my Grid environment?

In the previous step you obtained your grid certificate, but actually your grid credentials consist of two parts. A private key which was generated on your machine when you made your certificate request, and the certificate you got from the CA based on the public key that was generated together with your private key. Most CAs (also CERN) issue the certificates through a web interface, and so the private key and the certificate end up in your browser's certificate store. The grid tools need the credentials in the form of two separate files in PEM format, so you will need to export them. Look at this page for instructions on how to do this.

How to get access to CMS Grid resources: Authentication/Authorization using VOMS

If you have not done so already, you must register first at VOMRS-CMS (Also look at this page). You will need to have your certificate loaded in the browser, when you visit the page, or the server will refuse your connection (looks like if the server was dead - there will be no helpful error message).

This registration is needed for associating your grid certificate with CMS. Your certificate only confirms your identity, but to get authorized for using a CMS service, your certificate must get registered with the CMS VOMS system. This system also allows you to request certain roles (similar to belonging to different UNIX groups), although for normal users, the defaults will be all right. After registering you can see the roles that have been assigned to you on the VOMRS server or also directly on the CMS VOMS server page.

Every time you work on the grid you will create a derivative certificate (and matching key) from your Grid certificate with a short live time of ca 12 h : the so called proxy certificate. If this temporary certificate gets stolen on the grid, a malicious user can impersonate you for just that limited time, but your real certificate is still safe.

Use the VOMS voms-proxy-init command to obtain a Grid proxy certificate. By this you will get a proxy certificate that not only contains authentication information (who you are) but also authorization information (what you are allowed to do, role information). To check what roles you are allowed to assume, visit the CMS VOMRS server (requires that your firewall does not block outgoing port 8443).

Currently, you should use the following command to obain a generic CMS valid proxy

voms-proxy-init -voms cms

To generate a generic proxy certificate with no roles information:

voms-proxy-init

To generate a proxy certificate for the production role:

voms-proxy-init -voms cms:/cms/Role=production

To look at the authorization information of your proxy certificate:

voms-proxy-info -all

Where on the SE can I save data produced by my jobs?

Since early 2008 CMS has a regulation on the path to be used for user generated data. This is defined in the CMS namespace policy.

For the user space at T2_CH_CSCS this translates to (hn_username stands for your official CMS hypernews user name)

srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/$hn_username/*

for harvested files residing at a "home" institute. This is custodial data for the home site.

```
srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/temp/user/$hn_username/*
```
for temporary files hosted on any site's SE. This portion of the namespace is treated as /tmp, e.g. is volatile.

Since CSCS does not feature tape backups of SE space, the files are only protected by a software RAID, and you should migrate your data off, if you want to have it more secure.

Our earlier convention on CSCS user space in

srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local/$username

is obsolete and should no longer be used.

How can I get a list of data sets available at our site / on the Grid?

Definitions: The CMS Dataset Bookkeeping System (DBS) is a database and user API that indexes event-data data for the CMS Collaboration. The Data Location Service (DLS) is a database that records which data sets are available at which sites (i.e. it lists the of data sets).

Howto locate data: There is a discovery web page for both systems at this URL: http://cmsdbs.cern.ch/ A presentation on how to discover data can be found here. The direct use of DLS via a client installation is not recommended for now, but it's available on lxplus at CERN.

Information on datasets and their location is also stored in the PhEDEx data transfer system, but it is only guaranteed to contain information about recently transported data (i.e. there is no guarantee that information will be dropped at some point. DLS is the authoritative source for location information). Still, via PhEDEx you can get information on which sets have been ordered by a center and whether they are arriving. Start from this page for our CSCS Tier-2: PhEDEx replica page to locate data.

How can I order official datasets to the CSCS Tier-2?

CMS uses the PhEDEx system for its data management. The site's local CMS administrator an order a data set for you.

Identify the data by its official name (For current naming conventions, see this CERN Wiki entry) . You can look up the available data sets via the DBS discovery. Make sure, that they exist. If a site has not correctly published the data, it cannot be fetched.
Use the PhEDEx create transfer request page to order the dataset. You need to log in via your hypenews name/password (login link can be found in upper right screen corner ).
- IMPORTANT: Please indicate in the standard YYYY-MM or YYYY-MM-dd format for how long we should host this sample for you. This we require to make management of the space easier. Example:
```
   retention time: 2009-02-12 
```
  The set will not be automatically erased at that date, but you will be prompted at that time.
- In the DBS: area, leave the preselected default (except you specifically know what you're doing)
- Rules are different depending on where you need the data:
  - If you need data only at PSI, subscribe it both to CSCS and PSI and indicate your phys group. The data at CSCS will be deleted by us after transfer to PSI has been completed
  - If you want data both at CSCS and PSI, please make two distinct subscriptions
  - Furthermore, when subscribing data to CSCS, indicate the *"local" group instead of your phys group (this is for central accounting and space usage), unless you are official physics groups data manager
- In the destinations area check the box for the T2_CH_CSCS and/or the T3_CH_PSI center.
- In the data items area post a list of your datasets (one per line) like this:
```
   /Bs2MuMu/CMSSW_1_6_7-CSA07-1193558750/RECO
   /Bc2MuMuMuNu/CMSSW_1_6_7-CSA07-1193556423/RECO
   /Bc2JpsiMuNu/CMSSW_1_6_7-CSA07-1193556369/RECO 
```
- In the Group area you can indicate on behalf of which group you are making this request. Group data managers can order for their groups. If you are no group data manager, always label your order with the local tag.
- For the other options you normally want to select Replica, Growing, Normal Priority.
The request will be approved/disapproved by the local PhEDEx data manager of the site. You will receive emails about the updates to your request.
Please understand that at some point we will start to refuse requests with undefined group or retention time information, because the lack of this information severely impacts an efficient management of the storage space and will involve all of us in an email ping pong.

The status of the transfers can be seen on the PhEDEx replica page for our site

How can the data be accessed? What is the local path?

The generic CMS path (usually something starting with /store/...) is not the local path of the file, but almost. CMS uses a trivial file catalog (TFC) implementation to define local filenames at a site, based on a rules file: Mostly, the rules just define a site specific prefix to be put in front of the generic CMS path, but they also can be more complex. The rules file is located at a standard location on every site, so it can be found automatically by CMS jobs. So, if you are using official CMS tools like CRAB to generate your jobs you can use the generic CMS names and the software will transparently translate them to site specific names.

If you need direct access to files on our site, you can obtain the local filenames from the PhEDEx web service, e.g. by using the following command

wget -O- "http://cmsweb.cern.ch/phedex/datasvc/xml/prod/lfn2pfn?node=T2_CH_CSCS&protocol=dcap&lfn=/store/somefile" 2>/dev/null |sed -e "s/.*pfn='\([^']*\).*/\1\n/"

#Example output
dcap://storage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/cms/trivcat/store/somefile

In this way you can query for the correct URL on any CMS gridsite and with any of the supported protocols.

The following protocols and user tools can be used to access files:

SRMv2 (SRM version 2)

This should be the standard protocol for interactively accessing files. There is an excellent SRM howto guide from OSG

Use srmcp to move data from and to the SE. The accessible part of the namespace for CMS is under srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms.

Example:
For easier readability, we define the variable SRMBASE first

SRMBASE=srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms
srmcp -2 ${SRMBASE}/trivcat/store/CSA06/CSA06-102-os-minbias-0/somefile.root  file:////tmp/mytest.tst
srmcp -2 file:////tmp/mytest.tst  ${SRMBASE}/local_tests/mytestfile

You can list directory contents and details with the srmls command:

srmls -l ${SRMBASE}/trivcat/store

Note, that the file sizes in the output of a directory listing may not be correct for files larger then 2GB. But using the -l flag option to the command will produce the correct size:

srmls -l ${SRMBASE}/local_tests/big_file
2467069744 /pnfs/lcg.cscs.ch/cms/local_tests/big_file

For deleting files, use the srmrm command. You can delete directories with srmrmdir which also supports a -recursive flag, but the recursive behavior will only remove empty directories for safety reasons.

srmrm ${SRMBASE}/local_tests/testdir/gaga
srmrmdir ${SRMBASE}/local_tests/testdir

SRMv1 (SRM version 1)

You can use srmcp to move data from and to the SE. The accessible part of the namespace for CMS is under srm://storage01.lcg.cscs.ch:8443/srm/managerv1?SFN=/pnfs/lcg.cscs.ch/cms.

Example:
For easier readability, we define the variable SRMBASE first

SRMBASE=srm://storage01.lcg.cscs.ch:8443/srm/managerv1?SFN=/pnfs/lcg.cscs.ch/cms
srmcp ${SRMBASE}/trivcat/store/CSA06/CSA06-102-os-minbias-0/somefile.root  file:////tmp/mytest.tst
srmcp  file:////tmp/mytest.tst  ${SRMBASE}/local_tests/mytestfile

Note that the file protocol specification needs 4 slashes to refer to the local root file system "/". If you only use 3 slashes the file is staged relative to your current directory.

The srm-get-metadata command is very useful in getting information about a file or directory, e.g.

srm-get-metadata ${SRMBASE}/trivcat/store
                     size :0
                     owner :cmssgm
                     group :cms
                     permMode :16893
                     ...

Deleting files: Use the srm-advisory-delete command

srm-advisory-delete ${SRMBASE}/local_tests/myfile.dat

Grid FTP

Example:

GSIFTPBASE=gsiftp://storage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/cms
globus-url-copy ${GSIFTPBASE}/local_tests/testfile.tst file:/tmp/mytest.tst
globus-url-copy ${GSIFTPBASE}/trivcat/store/CSA06/CSA06-102-os-minbias-0/file.root  file:/tmp/mytest.tst

You can use edg-gridftp-ls to look at files and directories, e.g.

edg-gridftp-ls ${GSIFTPBASE}/trivcat/store

dcap

This protocol can be used either from the local UI or by jobs running at the site, not over the WAN! It is a protocol that usually is limited to on-site machines. There is a command line copy comand dccp, but more frequently this is the protocol of choice for direct access of files through ROOT or CMSSW.

Example:

DCAPBASE=dcap://storage01.lcg.cscs.ch:22125/pnfs/lcg.cscs.ch/cms
dccp $DCAPBASE/local_tests/mytesttree.dat /tmp/somefile

Note: dcap only allows read access to the SE (since this is a authenticationless protocol).

FTS

The FTS (File Transfer system) implements a queueing system for file transfers. It is useful for moving large data sets around, since it is able to control the bandwidth and other transfer specific parameters on predefined channels. Usage is complex and we do not currently recommend it for normal users. Contact us if you need to transfer lots of data that is not known to PhEDEx.

Xrootd

We offer read only access via xrootd (the Extended Root Daemon). This access method can only be used from within the site and does currently not require a grid proxy. To use it in your jobs, follow these steps:

Set up your ROOT environment

export ROOTSYS=$VO_CMS_SW_DIR/../sitelocal/ROOT/ROOT_5.06.00 export PATH=$PATH:$ROOTSYS/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROOTSYS/lib

Convert the gsiftp-PFN into a root daemon PFN by changing the protocol to "root://" and introducing another double slash (!!!) after the host name:

gsiftp://host/dir1/dir2/filename ---> root://host//dir1/dir2/filename
You can use the following sed expression for the conversion (gsiftp TURL is contained in $name)
echo $name| sed -e 's/gsiftp:\/\/$[^/]*$/root:\/\/\1\//'

Use the xrdcp command line client to copy whole files. Naturally you can also access files directly from within a root job

xrdcp -s root://storage01.lcg.cscs.ch//storage/cms/local_tests/testfile.tst /tmp/mytest.tst
Note: The -s flag chooses silent mode (no progress bar) since we use the command not in an interactive session

What CMS software is available at the site?

CMSSW

The "official" CMSSW software packages are maintained by a central CMS software manager who triggers installation on all sites. The "tags" of the officially available software are entered into the Grid information system used by the LCG software, from where they can be picked up by the framework. On any correctly configured LCG User Interface (UI) you can enter the following command to get information about the installations on all sites.

   lcg-infosites --vo cms tag

Jobs find out about the location of the software on a given site by looking up the $VO_CMS_SW_DIR environment variable.

CRAB - out of date! needs to be revised!

There is a CRAB (CERN info page) installation on the CSCS UI. So, if you have an UI account at CSCS you can use it to submit job ranges. This can be attractive, since you have more local space there as on CERN AFS. The CRAB installations can be found under /opt/CRAB/CRAB_[version]. In order to use it, you need to source e.g.
. /opt/CRAB/CRAB_2_2_0/crab.sh

links to external CMS related information

How to get access to LCG as a CMS user (also has VOMS information)
CMS hyper news replaces the mailing lists and is one of the best sources for up-to-date information
Central CMS twiki
- CMS Workbook Documentation for CMSSW and the computing environment
- Production pages
- List of RB (resource brokers), etc. available for CMS users
CMS computing TDR