Information for CMS Users of the PHOENIX cluster
Where can I obtain a Grid certificate?
If you have a CERN NICE account, it is best to get a certificate from CERN, just by filling out a form on the new
CERN CA web site. If not you can get a certificate form the Swiss Grid CA which is operated by SWITCH.
For people without CERN accounts:
SWITCH is acting as the Swiss Grid CA (Certification Authority). They are testing this service currently in a pilot phase.
In order to obtain a Grid certificate, got to this page on their site:
https://www.switch.ch/grid/certificates/obtain/
Follow all the steps exactly as indicated on this page. Print out the registration document (best from the link in the return mail you receive upon
submission of the web forms, because most entries have allready been filled in there).
You have to give your
full name exactly as it appears in your passport or ID
Sign the form
only at 6a), not at points b) and c)
You have to bring these documents to the responsible security manager of your insitute. He will sign the registration document
and send it on.
Institute responsible security manager:
- For PSI: Tobias Marx (WHGA/U134)
How can I get the certificate out of the browser and into my Grid environment?
In the previous step you obtained your
grid certificate, but actually your grid credentials consist of two parts. A
private key which was generated on your machine when you made your certificate request, and the certificate you got from the CA based on the
public key that was generated together with your private key. Most CAs (also CERN) issue the certificates through a web interface, and so the private key and the certificate end up in your browser's certificate store. The grid tools need the credentials in the form of two separate files in PEM format, so you will need to export them. Look at
this page for instructions on how to do this.
How to get access to CMS Grid resources: Authentication/Authorization using VOMS
If you have not done so already,
you must register first at
VOMRS-CMS (Also look at
this page). You will need to have your certificate loaded in the browser, when you visit the page, or the server will refuse your connection (looks like if the server was dead - there will be no helpful error message).
This registration is needed for associating your grid certificate with CMS. Your certificate only confirms your identity, but to get authorized for
using a CMS service, your certificate must get registered with the CMS VOMS system. This system also allows you to request certain roles (similar to belonging to different UNIX groups), although for normal users, the defaults will be all right. After registering you can see the roles that have been assigned to you on the VOMRS server or also directly on the
CMS VOMS server page.
Every time you work on the grid you will create a derivative certificate (and matching key) from your Grid certificate with a short live time of ca 12 h : the so called proxy certificate. If this temporary certificate gets stolen on the grid, a malicious user can impersonate you for just that limited time, but your real certificate is still safe.
Use the VOMS
voms-proxy-init
command to obtain a Grid proxy certificate. By this you will get a proxy certificate that not only contains authentication information (
who you are) but also authorization information (
what you are allowed to do, role information). To check what roles you are allowed to assume, visit the
CMS VOMRS server (requires that your firewall does not block outgoing port 8443).
Currently, you should use the following command to obain a generic CMS valid proxy
voms-proxy-init -voms cms
To generate a generic proxy certificate with no roles information:
voms-proxy-init
To generate a proxy certificate for the production role:
voms-proxy-init -voms cms:/cms/Role=production
To look at the authorization information of your proxy certificate:
voms-proxy-info -all
Where on the SE can I save data produced by my jobs?
Since early 2008 CMS has a regulation on the path to be used for user generated data. This is defined in the
CMS namespace policy.
For the user space at T2_CH_CSCS this translates to (hn_username stands for your official CMS hypernews user name)
-
srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/$hn_username/*
for harvested files residing at a "home" institute. This is custodial data for the home site.
-
srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/temp/user/$hn_username/*
for temporary files hosted on any site's SE. This portion of the namespace is treated as /tmp, e.g. is volatile.
Since CSCS does not feature tape backups of SE space, the files are only protected by a software RAID, and you should migrate your data off, if you want to have it more secure.
Our earlier convention on CSCS user space in
srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/local/$username
is obsolete and should no longer be used.
How can I get a list of data sets available at our site / on the Grid?
Definitions: The CMS
Dataset Bookkeeping System (
DBS) is a database and user API that indexes event-data data for the CMS Collaboration.
The
Data Location Service (
DLS) is a database that records which data sets are available at which sites (i.e. it lists the of data sets).
Howto locate data: There is a discovery web page for both systems at this URL
https://cmsweb.cern.ch/das/
Information on datasets and their location is also stored in the PhEDEx data transfer system, but it is only guaranteed to contain information about recently transported data (i.e. there is no guarantee that information will be dropped at some point. DLS is the authoritative source for location information).
Still, via PhEDEx you can get information on which sets have been ordered by a center and whether they are arriving. Start from this page for our CSCS Tier-2:
PhEDEx replica page to locate data.
How can I order official datasets to the CSCS Tier-2?
CMS uses the
PhEDEx system for its data management. The site's
local CMS administrator an order a data set for you.
- Identify the data by its official name (For current naming conventions, see this CERN Wiki entry) . You can look up the available data sets via the DAS. Make sure, that they exist. If a site has not correctly published the data, it cannot be fetched.
- Use the PhEDEx create transfer request page to order the dataset. You need to log in via your hypenews name/password (login link can be found in upper right screen corner ).
- The request will be approved/disapproved by the local PhEDEx data manager of the site. You will receive emails about the updates to your request.
- Please understand that at some point we will start to refuse requests with undefined group or retention time information, because the lack of this information severely impacts an efficient management of the storage space and will involve all of us in an email ping pong.
- The status of the transfers can be seen on the PhEDEx replica page for our site
How can the data be accessed? What is the local path?
The generic CMS path (usually something starting with
/store/...
) is not the local path of the file, but almost.
CMS uses a
trivial file catalog (
TFC) implementation to define local filenames at a site, based on a rules file: Mostly, the rules just define a site specific prefix to be put in front of the generic CMS path, but they also can be more complex. The rules file is located at a standard location on every site, so it can be found automatically by CMS jobs. So, if you are using official CMS tools like CRAB to generate your jobs you can use the generic CMS names and the software will transparently translate them to site specific names.
If you need direct access to files on our site, you can obtain the local filenames from the PhEDEx web service, e.g. by using the following command
wget -O- "http://cmsweb.cern.ch/phedex/datasvc/xml/prod/lfn2pfn?node=T2_CH_CSCS&protocol=dcap&lfn=/store/somefile" 2>/dev/null |sed -e "s/.*pfn='\([^']*\).*/\1\n/"
#Example output
dcap://storage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/cms/trivcat/store/somefile
In this way you can query for the correct URL on any CMS gridsite and with any of the supported protocols.
The following protocols and user tools can be used to access files:
SRMv2 (SRM version 2)
This should be the standard protocol for interactively accessing files.
There is an excellent
SRM howto guide from OSG.
Use
srmcp to move data from and to the SE. The accessible part of the namespace for CMS is under
srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms.
Example:
For easier readability, we define the variable SRMBASE first
SRMBASE=srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms
srmcp -2 ${SRMBASE}/trivcat/store/CSA06/CSA06-102-os-minbias-0/somefile.root file:////tmp/mytest.tst
srmcp -2 file:////tmp/mytest.tst ${SRMBASE}/local_tests/mytestfile
You can list directory contents and details with the
srmls
command:
srmls -l ${SRMBASE}/trivcat/store
Note, that the file sizes in the output of a directory listing may not be correct for files larger then 2GB. But using the
-l
flag option to the command will produce the correct size:
srmls -l ${SRMBASE}/local_tests/big_file
2467069744 /pnfs/lcg.cscs.ch/cms/local_tests/big_file
For deleting files, use the
srmrm
command. You can delete directories with
srmrmdir
which also supports a
-recursive
flag, but the recursive behavior will only remove empty directories for safety reasons.
srmrm ${SRMBASE}/local_tests/testdir/gaga
srmrmdir ${SRMBASE}/local_tests/testdir
SRMv1 (SRM version 1)
You can use
srmcp to move data from and to the SE. The accessible part of the namespace for CMS is under
srm://storage01.lcg.cscs.ch:8443/srm/managerv1?SFN=/pnfs/lcg.cscs.ch/cms.
Example:
For easier readability, we define the variable SRMBASE first
SRMBASE=srm://storage01.lcg.cscs.ch:8443/srm/managerv1?SFN=/pnfs/lcg.cscs.ch/cms
srmcp ${SRMBASE}/trivcat/store/CSA06/CSA06-102-os-minbias-0/somefile.root file:////tmp/mytest.tst
srmcp file:////tmp/mytest.tst ${SRMBASE}/local_tests/mytestfile
Note that the
file protocol specification needs
4 slashes to refer to the local root file system "/". If you only use 3 slashes the file is staged relative to your current directory.
The
srm-get-metadata
command is very useful in getting information about a file or directory, e.g.
srm-get-metadata ${SRMBASE}/trivcat/store
size :0
owner :cmssgm
group :cms
permMode :16893
...
Deleting files: Use the srm-advisory-delete command
srm-advisory-delete ${SRMBASE}/local_tests/myfile.dat
Grid FTP
Example:
GSIFTPBASE=gsiftp://storage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/cms
globus-url-copy ${GSIFTPBASE}/local_tests/testfile.tst file:/tmp/mytest.tst
globus-url-copy ${GSIFTPBASE}/trivcat/store/CSA06/CSA06-102-os-minbias-0/file.root file:/tmp/mytest.tst
You can use
edg-gridftp-ls to look at files and directories, e.g.
edg-gridftp-ls ${GSIFTPBASE}/trivcat/store
dcap
This protocol can be used either from the local UI or by jobs running at the site, not over the WAN! It is a protocol that usually is limited to on-site machines. There is a command line copy comand
dccp
, but more frequently this is the protocol of choice for direct access of files through ROOT or CMSSW.
Example:
DCAPBASE=dcap://storage01.lcg.cscs.ch:22125/pnfs/lcg.cscs.ch/cms
dccp $DCAPBASE/local_tests/mytesttree.dat /tmp/somefile
Note: dcap only allows read access to the SE (since this is a authenticationless protocol).
FTS
The FTS (File Transfer system) implements a queueing system for file transfers. It is useful for moving large data sets around, since it is able to control the bandwidth and other transfer specific parameters on predefined channels. Usage is complex and we do not currently recommend it for normal users. Contact us if you need to transfer lots of data that is not known to PhEDEx.
Xrootd
We offer
read only access via
xrootd (the Extended Root Daemon). This access method can
only be used from within the site and
does currently not require a grid proxy. To use it in your jobs, follow these steps:
- Set up your ROOT environment
export ROOTSYS=$VO_CMS_SW_DIR/../sitelocal/ROOT/ROOT_5.06.00
export PATH=$PATH:$ROOTSYS/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ROOTSYS/lib
- Convert the gsiftp-PFN into a root daemon PFN by changing the protocol to "root://" and introducing another double slash (!!!) after the host name:
gsiftp://host/dir1/dir2/filename ---> root://host//dir1/dir2/filename
You can use the following sed expression for the conversion (gsiftp TURL is contained in $name)
echo $name| sed -e 's/gsiftp:\/\/\([^/]*\)/root:\/\/\1\//'
- Use the xrdcp command line client to copy whole files. Naturally you can also access files directly from within a root job
xrdcp -s root://storage01.lcg.cscs.ch//storage/cms/local_tests/testfile.tst /tmp/mytest.tst
Note: The
-s flag chooses silent mode (no progress bar) since we use the command not in an interactive session
What CMS software is available at the site?
CMSSW
The "official" CMSSW software packages are maintained by a central CMS software manager who triggers installation on all sites. The "tags" of the officially available software are entered into the Grid information system used by the LCG software, from where they can be picked up by the framework. On any correctly configured LCG User Interface (UI) you can enter the following command to get information about the installations on all sites.
lcg-infosites --vo cms tag
Jobs find out about the location of the software on a given site by looking up the $VO_CMS_SW_DIR environment variable.
CRAB - out of date! needs to be revised!
There is a CRAB (CERN info page) installation on the CSCS UI. So, if you have an UI account at CSCS you can use it to submit job ranges. This can be attractive, since you have more local space there as on CERN AFS.
The CRAB installations can be found under /opt/CRAB/CRAB_[version]
. In order to use it, you need to source e.g.
. /opt/CRAB/CRAB_2_2_0/crab.sh
links to external CMS related information
How to troubleshoot Grid/CRAB jobs
--
DerekFeichtinger - 21 Jan 2008