Tags:
create new tag
view all tags

PhEDEx

PhEDEx is the data transfer service used by CMS to transfer data sets beween sites.

Central Monitoring

Service startup, stop and status

We run multiple instances of PhEDEx. One is for Production transfers, and another for Debugging. Currently, these instances are active:

Debug Prod

Note that PhEDEx is run by the phedex user and not by root! I (Derek) wrote some custom init scripts which make starting and stopping much simpler than in the original:

  • startup scripts at /home/phedex/config/T3_CH_PSI/PhEDEx/tools/init.d/phedex_[instance] (start|status|stop). The init script will check for a valid service certificate before startup! Example:
    /home/phedex/init.d/phedex_Prod start /home/phedex/init.d/phedex_Debug start 
    • Note that startup can take many seconds due to the reconnecting to the DB being slow. Have some patience and look at the output. Agent by agent, they should come up.
  • also check status via the central web site http://cmsdoc.cern.ch/cms/aprom/phedex/prod/Components::Status?view=global
  • make sure that there is always a valid user proxy available to PhEDEx (see in the Auth/Authz section, below)

[phedex@t3cmsvobox01 ~]$ /home/phedex/config/T3_CH_PSI/PhEDEx/tools/init.d/phedex_Prod status
blockverify (22325)     [UP]
download-remove (22392)         [UP]
download-t1 (22460)     [UP]
download-t2 (22522)     [UP]
exp-pfn (22812)         [UP]
Watchdog (22997)        [UP]
WatchdogLite (23022)    [UP]

PhEDEx Agent Auth/Authz by grid certificates and how to renew them

Explanation

The PhEDEx daemons require a valid grid proxy of a CMS site Data Manager in /home/phedex/gridcert/proxy.cert to transfer files. This short-lived proxy certificate is renewed every few hours through a cron job running the myproxy-logon command (/etc/cron.d/cron_proxy.sh that invokes our version controlled script /home/phedex/config/T3_CH_PSI/PhEDEx/tools/cron/cron_proxy.sh). The renewal is based on a long-living proxy certificate of the CMS site Data Manager that is stored on the CERN myproxy server myproxy.cern.ch.

So, the local site CMS Data Manager needs to always deposit an own long lived proxy on the myproxy server and make sure that it has enough time left. The phedex user (through the cron job) renews the local certificate by - presenting the old but still valid proxy certificate /home/phedex/gridcert/proxy.cert of the local CMS Data Manager. This is the proxy that needs to get renewed. - presenting its own valid proxy certificate that is produced like a normal user certificate from the certificate in ~/.globus/usercert.pem and ~/.globus/userkey.pem. This certificate is by convention the hostcert.pem and hostkey.pem belonging to the phedex host. So, you must make sure that upon changes of the host cert you also change the certs of the phedex user.

HELP The myproxy server only allows registered entities (= distinguished names) to renew a another user's proxy certificate (the host cert being allowed to renew a user proxy, as we are doing here). You need to have the DN of the host cert entered into the myproxy server's configuration (so our vobox host DN subject has to be registered with the myproxy service). I. e. you need to contact the responsible admins for the myproxy.cern.ch server if the hostname of the cmsvobox changes! Write a mail to Helpdesk@cern.ch

Checking service grid-proxy lifetime

On the VO-BOX

[phedex@t3cmsvobox01 ~]$ openssl x509 -subject -dates -noout -in /home/phedex/gridcert/proxy.cert 
subject= /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger/CN=proxy/CN=proxy/CN=proxy/CN=proxy
notBefore=Jan 30 13:00:05 2019 GMT
notAfter=Jan 31 00:59:05 2019 GMT

Command Line commands for placing a long lived certificate on the myproxy server

On a user interface!

[feichtinger@t3ui01 ~]$ source /swshare/psit3/etc/profile.d/cms_ui_env.sh
[feichtinger@t3ui01 ~]$ voms-proxy-init -voms cms -valid 192:00
Enter GRID pass phrase for this identity:
Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"...
Remote VOMS server contacted succesfully.


Created proxy in /t3home/feichtinger/.x509up_u3896.

Your proxy is valid until Thu Feb 07 14:19:21 CET 2019

Now create the long time certificate on the myproxy server

[feichtinger@t3ui01 ~]$ myproxy-init -t 168 -R t3cmsvobox.psi.ch -l psi_phedex_derek -x -k renewable -s myproxy.cern.ch -c 1500
Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger
Enter GRID pass phrase for this identity:
Creating proxy .............................................. Done
Proxy Verify OK
Your proxy is valid until: Wed Apr  3 03:20:31 2019
A proxy valid for 1500 hours (62.5 days) for user psi_phedex_derek now exists on myproxy.cern.ch.
[feichtinger@t3ui01 ~]$ myproxy-info -s myproxy.cern.ch -l psi_phedex_derek
username: psi_phedex_derek
owner: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger
  name: renewable
  renewal policy: */CN=t3cmsvobox.psi.ch
  timeleft: 1499:59:17  (62.5 days)

Testing certificate renewal on the vobox

On the VO-BOX!

Create a valid proxy for the phedex user (based on the host cert)

[phedex@t3cmsvobox01 ~]$ voms-proxy-init

Created proxy in /tmp/x509up_u205.

Your proxy is valid until Thu Jan 31 02:29:10 CET 2019
[phedex@t3cmsvobox01 ~]$ voms-proxy-info
subject   : /DC=EU/DC=EGI/C=CH/O=Hosts/O=Paul-Scherrer-Institut (PSI)/CN=t3cmsvobox.psi.ch/CN=2047398272
issuer    : /DC=EU/DC=EGI/C=CH/O=Hosts/O=Paul-Scherrer-Institut (PSI)/CN=t3cmsvobox.psi.ch
identity  : /DC=EU/DC=EGI/C=CH/O=Hosts/O=Paul-Scherrer-Institut (PSI)/CN=t3cmsvobox.psi.ch
type      : RFC3820 compliant impersonation proxy
strength  : 1024
path      : /tmp/x509up_u205
timeleft  : 11:59:49
key usage : Digital Signature, Key Encipherment, Data Encipherment

Testing whether the delegated renewal works on the vobox (remember that the proxy.cert to be renewed still needs to have some lifetime left):

Renew the user cert via the myproxy server

[phedex@t3cmsvobox01 ~]$  myproxy-logon -s myproxy.cern.ch -v -m cms -l psi_phedex_derek -a /home/phedex/gridcert/proxy.cert -o /tmp/testproxy -k renewable
MyProxy v6.1 Jul 2015 PAM SASL KRB5 LDAP VOMS OCSP
Attempting to connect to 188.184.67.101:7512 
Successfully connected to myproxy.cern.ch:7512 
using trusted certificates directory /etc/grid-security/certificates
Using Proxy file (/tmp/x509up_u205)
server name: /DC=ch/DC=cern/OU=computers/CN=px503.cern.ch
checking that server name is acceptable...
server name matches ""
authenticated server name is acceptable
running: voms-proxy-init -valid 11:59 -vomslife 11:59 -voms cms -cert /tmp/testproxy -key /tmp/testproxy -out /tmp/testproxy -bits 2048 -noregen -proxyver=2
Contacting lcg-voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch] "cms"...
Remote VOMS server contacted succesfully.


Created proxy in /tmp/testproxy.

Your proxy is valid until Thu Jan 31 02:31:00 CET 2019
A credential has been received for user psi_phedex_derek in /tmp/testproxy.

Changing the user running the PhEDEx service

As explained above, PhEDEx is performing its tasks by authenticating with the proxy certificate belonging to the local site Data Manager. For writing to our local Storage Element we have an additional authorization requirement in place, since we do not want that normal users write erroneously to the directories dedicated to the central data management. So, these central directories belong to the current data manager, and changing to another user's proxy cert for running the service will involve a chown -R of these directories!!!

At the time of writing these instructions, the relevant directories are owned by feichtinger

[root@t3dcachedb03 ~]# ls -l /pnfs/psi.ch/cms/trivcat/store/
total 6
drwxr-xr-x   3 feichtinger cms  512 Jul 31  2016 backfill
drwxr-xr-x  32 feichtinger cms  512 May 23  2018 data
drwxr-xr-x  41 feichtinger cms  512 May 23  2018 mc
drwxr-xr-x   2 root        root 512 Sep 13  2017 outdated-user
drwxr-xr-x   5 feichtinger cms  512 Oct  2  2009 PhEDEx_LoadTest07
drwxr-xr-x   2 feichtinger cms  512 Apr 16  2015 PhEDEx_LoadTest_SingleSource
drwxr-xr-x  20 feichtinger cms  512 Mar  7  2017 relval
drwxr-xr-x  12 root        cms  512 Nov  8  2013 t3groups
drwxr-xr-x   3 root        root 512 Jul 31  2018 test
drwxr-x---   3 root        cms  512 Oct 23  2013 unmerged
dr-xr-xr-x 108 root        cms  512 Jan 25 13:49 user

Configuration

The PhEDEx configuration can be found in ~phedex/config:

  • DBParam.PSI: Passwords needed for accessing the central PhEDEx data base. These we receive encrypted from cms-phedex-admins@cern.ch. The file contains one section for every PhEDEx instance (Prod, Debug, ...)
  • T3_CH_PSI/PhEDEx/Config*: Configuration definitions for the PhEDEx instances (including load tests)
  • T3_CH_PSI/PhEDEx/storage.xml: defines the trivial file catalog mappings
  • T3_CH_PSI/PhEDEx/FileDownload*: site specific scripts called by the download agent
  • T3_CH_PSI/PhEDEx/fts.map: mapping of SRM endpoints to FTS servers (q.v. CERN Twiki)

The SITECONF area is a checkout from the central CERN/CMS Gitlab repository (https://gitlab.cern.ch/SITECONF/T3_CH_PSI/). Any changes always must be committed and pushed to the central repository, central operations requires this and runs tests on it

PhEDEx SW Installation

Nowadays the PhEDEx SW is distributed via CVMFS. There is no longer the need for local installations.

There is a symbolic link /home/phedex/PHEDEX which points to the active PhEDEx distribution, so that the configuration files need not be changed with every update (though the link needs to be reset):

PHEDEX -> /cvmfs/cms.cern.ch/phedex/slc6_amd64_gcc493/cms/PHEDEX/4.2.1

DB access for the agents

PhEDEx relies on a central Oracle data base at CERN. The passwords for accessing it are stored in ~/config/DBParam.PSI (q.v. configuration section above).

Interactive access to the DB through sqlplus

[phedex@t3cmsvobox01 ~]$ sqlplus $(/home/phedex/PHEDEX/Utilities/OracleConnectId -db /home/phedex/config/DBParam.PSI:Prod/PSI)                                                  

SQL*Plus: Release 11.2.0.4.0 Production on Wed Jan 30 13:54:04 2019

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> select id,name from t_adm_node where name like '%CSCS%' or name like '%PSI%' ;

        ID NAME
---------- --------------------
        27 T2_CH_CSCS
       821 T3_CH_PSI

A complex query to list data sets


SQL> select distinct r.id, r.created_by, r.time_create,r.comments reqcomid, rds.dataset_id, rds.name, rd.decided_by, rd.time_decided, rd.comments accomid  from t_req_request r 
join t_req_type rt on rt.id = r.type join t_req_node rn on rn.request = r.id left join t_req_decision rd on rd.request = r.id and rd.node = rn.node join t_req_dataset rds on rd
s.request = r.id where rn.node = 821 and rt.name = 'xfer' and rd.decision = 'y' and dataset_id in (select distinct b.dataset  from t_dps_block b join t_dps_block_replica br on 
b.id = br.block join t_dps_dataset d on d.id = b.dataset where node = 821 ) order by r.time_create desc ;


        ID CREATED_BY TIME_CREATE   REQCOMID DATASET_ID NAME DECIDED_BY TIME_DECIDED    ACCOMID
        ---------- ---------- ----------- -----
        1524693    2154904  1542380808    1314498    1052091 /ZeroBias/Run2017C-v1/RAW   2154907   1542381023
   1498082    2119927  1540539778    1289032    1155914 /ZprimeToTauTau_M-2000_TuneCP5_13TeV-pythia8-tauola/RunIIFall17NanoAOD-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/NANOAODSIM 2120264 1540563865
...

Storage Management for PhEDEx Datasets

Many data sets are nowadays placed by CMS central services (mostly by the Dynamo Service) and central CMS should then also be responsible for issuing the deletion requests for these sets. But datasets can also be ordered by users and here it is necessary to every few months have a look whether old sets should be removed.

I prefer to get the lists by tools which I (Derek) developed for PhEDEx and which are still part of its distribution. The web displays can be slow and in the end we would like to copy or sort through the names:

  • /home/phedex/config/T3_CH_PSI/PhEDEx/tools/DB-query-tools/ListSiteDataInfo.pl: Lists data sets by directly querying the DB

I placed two scripts in /home/phedex/dataset-cleaning on the VO-Box. They will produce files for PSI and CSCS with the list of resident data sets and some meta information.

  • get-datasets-psi.sh
  • get-datasets-cscs.sh

PhEDEx deletion requests are made by filling out the web form of the central service. The requests MUST get approved by a Data Manager (can be one of our local site data managers. The user running the phedex service should always be a data manager and therefore his requests will automatically be approved).

Old Documentation

This is a new and adapted documentation based on

-- DerekFeichtinger - 2019-01-30

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r2 - 2019-01-30 - DerekFeichtinger
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback