PhEDEx
PhEDEx is the data transfer service used by CMS to transfer data sets beween sites.
Central Monitoring
Service startup, stop and status
We run multiple instances of PhEDEx. One is for Production transfers, and another for Debugging. Currently, these instances are active:
Note that PhEDEx is run by the phedex
user and not by root! I (Derek) wrote some custom init scripts which make starting and stopping much simpler than in the original:
[phedex@t3cmsvobox01 ~]$ /home/phedex/config/T3_CH_PSI/PhEDEx/tools/init.d/phedex_Prod status
blockverify (22325) [UP]
download-remove (22392) [UP]
download-t1 (22460) [UP]
download-t2 (22522) [UP]
exp-pfn (22812) [UP]
Watchdog (22997) [UP]
WatchdogLite (23022) [UP]
PhEDEx Agent Auth/Authz by grid certificates and how to renew them
Explanation
The PhEDEx daemons require a valid grid proxy of a CMS site Data Manager in
/home/phedex/gridcert/proxy.cert
to transfer files. This short-lived proxy certificate is renewed every few hours through a cron job running the
myproxy-logon
command (
/etc/cron.d/cron_proxy.sh
that invokes our version controlled script
/home/phedex/config/T3_CH_PSI/PhEDEx/tools/cron/cron_proxy.sh
). The renewal is based on a long-living proxy certificate of the CMS site Data Manager that is stored on the CERN myproxy server
myproxy.cern.ch
.
So, the local site CMS Data Manager needs to always deposit an own long lived proxy on the myproxy server and make sure that it has enough time left. The phedex user (through the cron job) renews the local certificate by
- presenting the old but still valid proxy certificate
/home/phedex/gridcert/proxy.cert
of the local CMS Data Manager. This is the proxy that needs to get renewed.
- presenting its own valid proxy certificate that is produced like a normal user certificate from the certificate in
~/.globus/usercert.pem
and
~/.globus/userkey.pem
. This certificate is by convention the
hostcert.pem
and
hostkey.pem
belonging to the phedex host. So, you must make sure that upon changes of the host cert you also change the certs of the phedex user.
The myproxy server only allows registered entities (= distinguished names) to renew a another user's proxy certificate (the host cert being allowed to renew a user proxy, as we are doing here). You need to have the DN of the host cert entered into the myproxy server's configuration (so our vobox host DN subject has to be registered with the myproxy service). I. e.
you need to contact the responsible admins for the myproxy.cern.ch server if the hostname of the cmsvobox changes! Write a mail to Helpdesk@cern.ch
Checking service grid-proxy lifetime
On the VO-BOX
[phedex@t3cmsvobox01 ~]$ openssl x509 -subject -dates -noout -in /home/phedex/gridcert/proxy.cert
subject= /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger/CN=proxy/CN=proxy/CN=proxy/CN=proxy
notBefore=Jan 30 13:00:05 2019 GMT
notAfter=Jan 31 00:59:05 2019 GMT
Command Line commands for placing a long lived certificate on the myproxy server
On a user interface!
[feichtinger@t3ui01 ~]$ source /swshare/psit3/etc/profile.d/cms_ui_env.sh
[feichtinger@t3ui01 ~]$ voms-proxy-init -voms cms -valid 192:00
Enter GRID pass phrase for this identity:
Contacting voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "cms"...
Remote VOMS server contacted succesfully.
Created proxy in /t3home/feichtinger/.x509up_u3896.
Your proxy is valid until Thu Feb 07 14:19:21 CET 2019
Now create the long time certificate on the myproxy server
[feichtinger@t3ui01 ~]$ myproxy-init -t 168 -R t3cmsvobox.psi.ch -l psi_phedex_derek -x -k renewable -s myproxy.cern.ch -c 1500
Your identity: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger
Enter GRID pass phrase for this identity:
Creating proxy .............................................. Done
Proxy Verify OK
Your proxy is valid until: Wed Apr 3 03:20:31 2019
A proxy valid for 1500 hours (62.5 days) for user psi_phedex_derek now exists on myproxy.cern.ch.
[feichtinger@t3ui01 ~]$ myproxy-info -s myproxy.cern.ch -l psi_phedex_derek
username: psi_phedex_derek
owner: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=dfeich/CN=613756/CN=Derek Feichtinger
name: renewable
renewal policy: */CN=t3cmsvobox.psi.ch
timeleft: 1499:59:17 (62.5 days)
Testing certificate renewal on the vobox
On the VO-BOX!
Create a valid proxy for the phedex user (based on the host cert)
[phedex@t3cmsvobox01 ~]$ voms-proxy-init
Created proxy in /tmp/x509up_u205.
Your proxy is valid until Thu Jan 31 02:29:10 CET 2019
[phedex@t3cmsvobox01 ~]$ voms-proxy-info
subject : /DC=EU/DC=EGI/C=CH/O=Hosts/O=Paul-Scherrer-Institut (PSI)/CN=t3cmsvobox.psi.ch/CN=2047398272
issuer : /DC=EU/DC=EGI/C=CH/O=Hosts/O=Paul-Scherrer-Institut (PSI)/CN=t3cmsvobox.psi.ch
identity : /DC=EU/DC=EGI/C=CH/O=Hosts/O=Paul-Scherrer-Institut (PSI)/CN=t3cmsvobox.psi.ch
type : RFC3820 compliant impersonation proxy
strength : 1024
path : /tmp/x509up_u205
timeleft : 11:59:49
key usage : Digital Signature, Key Encipherment, Data Encipherment
Testing whether the delegated renewal works on the vobox (remember that the proxy.cert to be renewed still needs to have some lifetime left):
Renew the user cert via the myproxy server
[phedex@t3cmsvobox01 ~]$ myproxy-logon -s myproxy.cern.ch -v -m cms -l psi_phedex_derek -a /home/phedex/gridcert/proxy.cert -o /tmp/testproxy -k renewable
MyProxy v6.1 Jul 2015 PAM SASL KRB5 LDAP VOMS OCSP
Attempting to connect to 188.184.67.101:7512
Successfully connected to myproxy.cern.ch:7512
using trusted certificates directory /etc/grid-security/certificates
Using Proxy file (/tmp/x509up_u205)
server name: /DC=ch/DC=cern/OU=computers/CN=px503.cern.ch
checking that server name is acceptable...
server name matches ""
authenticated server name is acceptable
running: voms-proxy-init -valid 11:59 -vomslife 11:59 -voms cms -cert /tmp/testproxy -key /tmp/testproxy -out /tmp/testproxy -bits 2048 -noregen -proxyver=2
Contacting lcg-voms2.cern.ch:15002 [/DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch] "cms"...
Remote VOMS server contacted succesfully.
Created proxy in /tmp/testproxy.
Your proxy is valid until Thu Jan 31 02:31:00 CET 2019
A credential has been received for user psi_phedex_derek in /tmp/testproxy.
Changing the user running the PhEDEx service
As explained above, PhEDEx is performing its tasks by authenticating with the proxy certificate belonging
to the local site Data Manager. For writing to our local Storage Element we have an additional authorization
requirement in place, since we do not want that normal users write erroneously to the directories dedicated to the
central data management. So, these central directories belong to the current data manager, and changing to another user's proxy cert for running the service will involve a
chown -R
of these directories!!!
At the time of writing these instructions, the relevant directories are owned by
feichtinger
[root@t3dcachedb03 ~]# ls -l /pnfs/psi.ch/cms/trivcat/store/
total 6
drwxr-xr-x 3 feichtinger cms 512 Jul 31 2016 backfill
drwxr-xr-x 32 feichtinger cms 512 May 23 2018 data
drwxr-xr-x 41 feichtinger cms 512 May 23 2018 mc
drwxr-xr-x 2 root root 512 Sep 13 2017 outdated-user
drwxr-xr-x 5 feichtinger cms 512 Oct 2 2009 PhEDEx_LoadTest07
drwxr-xr-x 2 feichtinger cms 512 Apr 16 2015 PhEDEx_LoadTest_SingleSource
drwxr-xr-x 20 feichtinger cms 512 Mar 7 2017 relval
drwxr-xr-x 12 root cms 512 Nov 8 2013 t3groups
drwxr-xr-x 3 root root 512 Jul 31 2018 test
drwxr-x--- 3 root cms 512 Oct 23 2013 unmerged
dr-xr-xr-x 108 root cms 512 Jan 25 13:49 user
Configuration
The PhEDEx configuration can be found in
~phedex/config
:
-
DBParam.PSI
: Passwords needed for accessing the central PhEDEx data base. These we receive encrypted from cms-phedex-admins@cern.ch. The file contains one section for every PhEDEx instance (Prod, Debug, ...)
-
T3_CH_PSI/PhEDEx/Config*
: Configuration definitions for the PhEDEx instances (including load tests)
-
T3_CH_PSI/PhEDEx/storage.xml
: defines the trivial file catalog mappings
-
T3_CH_PSI/PhEDEx/FileDownload*
: site specific scripts called by the download agent
-
T3_CH_PSI/PhEDEx/fts.map
: mapping of SRM endpoints to FTS servers (q.v. CERN Twiki)
The SITECONF area is a checkout from the
central CERN/CMS Gitlab repository (
https://gitlab.cern.ch/SITECONF/T3_CH_PSI/). Any changes always must be committed and pushed to the central repository, central operations requires this and runs tests on it
PhEDEx SW Installation
Nowadays the PhEDEx SW is distributed via
CVMFS. There is no longer the need for local installations.
There is a symbolic link
/home/phedex/PHEDEX
which points to the active PhEDEx distribution, so that the configuration files need not be changed with every update (though the link needs to be reset):
PHEDEX -> /cvmfs/cms.cern.ch/phedex/slc6_amd64_gcc493/cms/PHEDEX/4.2.1
DB access for the agents
PhEDEx relies on a central Oracle data base at CERN. The passwords for accessing it are stored in
~/config/DBParam.PSI
(q.v.
configuration section above).
Interactive access to the DB through sqlplus
[phedex@t3cmsvobox01 ~]$ sqlplus $(/home/phedex/PHEDEX/Utilities/OracleConnectId -db /home/phedex/config/DBParam.PSI:Prod/PSI)
SQL*Plus: Release 11.2.0.4.0 Production on Wed Jan 30 13:54:04 2019
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> select id,name from t_adm_node where name like '%CSCS%' or name like '%PSI%' ;
ID NAME
---------- --------------------
27 T2_CH_CSCS
821 T3_CH_PSI
A complex query to list data sets
SQL> select distinct r.id, r.created_by, r.time_create,r.comments reqcomid, rds.dataset_id, rds.name, rd.decided_by, rd.time_decided, rd.comments accomid from t_req_request r
join t_req_type rt on rt.id = r.type join t_req_node rn on rn.request = r.id left join t_req_decision rd on rd.request = r.id and rd.node = rn.node join t_req_dataset rds on rd
s.request = r.id where rn.node = 821 and rt.name = 'xfer' and rd.decision = 'y' and dataset_id in (select distinct b.dataset from t_dps_block b join t_dps_block_replica br on
b.id = br.block join t_dps_dataset d on d.id = b.dataset where node = 821 ) order by r.time_create desc ;
ID CREATED_BY TIME_CREATE REQCOMID DATASET_ID NAME DECIDED_BY TIME_DECIDED ACCOMID
---------- ---------- ----------- -----
1524693 2154904 1542380808 1314498 1052091 /ZeroBias/Run2017C-v1/RAW 2154907 1542381023
1498082 2119927 1540539778 1289032 1155914 /ZprimeToTauTau_M-2000_TuneCP5_13TeV-pythia8-tauola/RunIIFall17NanoAOD-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/NANOAODSIM 2120264 1540563865
...
Storage Management for PhEDEx Datasets
Many data sets are nowadays placed by CMS central services (mostly by the Dynamo Service) and central CMS should then also be responsible for issuing the deletion requests for these sets. But datasets can also be ordered by users and here it is necessary to every few months have a look whether old sets should be removed.
I prefer to get the lists by tools which I (Derek) developed for
PhEDEx and which are still part of its distribution. The web displays can be slow and in the end we would like to copy or sort through the names:
- /home/phedex/config/T3_CH_PSI/PhEDEx/tools/DB-query-tools/ListSiteDataInfo.pl: Lists data sets by directly querying the DB
I placed two scripts in
/home/phedex/dataset-cleaning
on the VO-Box. They will produce files for PSI and CSCS with the list of resident data sets and some meta information.
- get-datasets-psi.sh
- get-datasets-cscs.sh
PhEDEx deletion requests are made by filling out the
web form of the central service. The requests MUST get approved by a Data Manager (can be one of our local site data managers. The user running the phedex service should always be a data manager and therefore his requests will automatically be approved).
Old Documentation
This is a new and adapted documentation based on
--
DerekFeichtinger - 2019-01-30