Firewall requirements
Installation
Official Doc ( pretty chaotic )
https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhedexAdminDocsInstallation
CSCS Similar Doc
Refer to the description on the
LCGTier2/CmsVObox.
There is one important difference between CSCS and PSI :
while we use FTS channels for the transfers to CSCS we use the SRM backend for transfers to PSI, because we do not have a FTS channel for PSI. This issue is linked to registering PSI as a regular grid site, which until recently was not possible, since we only support a Grid SE, but no a CE.
Thus there isn't a
fts.map
file in the configuration area for the PhEDEx services.
Installation by Puppet
Installations are made by Fabio at PSI, usually nobody apart from him should care about this task.
Installation is described by the Puppet files
tier3-baseclasses.pp
and
SL6_vobox.pp
both saved in the dir
puppetdirenodes
, where
puppetdirenodes
is an alias defined in the following list :
alias kscustom64='cd /afs/psi.ch/software/linux/dist/scientific/64/custom'
alias ksdir='cd /afs/psi.ch/software/linux/kickstart/configs'
alias puppetdir='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/'
alias puppetdirnodes='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/nodes'
alias puppetdirredhat='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/RedHat'
alias puppetdirsolaris='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/Solaris/5.10'
alias yumdir6='cd /afs/psi.ch/software/linux/dist/scientific/6/scripts'
local X509
The local still valid X509 is essential in order to regularly renew from
myproxy.cern.ch
the Joosep Pata or Fabio Martinelli proxy saved in
/home/phedex/gridcert/proxy.cert
:
# ll /home/phedex/.globus/
total 4
lrwxrwxrwx 1 phedex phedex 31 Apr 13 18:44 usercert.pem -> /etc/grid-security/hostcert.pem
-r-------- 1 phedex phedex 1679 Apr 13 18:44 userkey.pem
[root@t3cmsvobox01 ~]# grid-cert-info --file /etc/grid-security/hostcert.pem
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 131 (0x83)
Signature Algorithm: sha256WithRSAEncryption
Issuer: DC=ORG, DC=SEE-GRID, CN=SEE-GRID CA 2013
Validity
Not Before: Feb 3 12:05:29 2016 GMT
Not After : Feb 2 12:05:29 2017 GMT
Subject: DC=EU, DC=EGI, C=CH,
...
this is cron constantly renewing the proxy
/home/phedex/gridcert/proxy.cert
used by
PhEDEx for its data transfers ; it also produces
PhEDEx stats in
/shome/phedex/phedex-statistics.txt
:
-bash-4.1$ cat /etc/cron.d/phedex
PHEDEXVER=4.1.7
#22 */4 * * * phedex source /home/phedex/PHEDEX/$PHEDEXVER/etc/profile.d/env.sh ; unset X509_USER_PROXY ; /usr/bin/voms-proxy-init ; /usr/bin/myproxy-get-delegation -s myproxy.cern.ch -k renewable -v -l psi_phedex_2016_fabio -a /home/phedex/gridcert/proxy.cert -o /home/phedex/gridcert/proxy.cert; export X509_USER_PROXY=/home/phedex/gridcert/proxy.cert; /usr/bin/voms-proxy-init -noregen -voms cms
# Feb '16
# https://cern.service-now.com/service-portal/view-incident.do?n=INC0956110
22 */4 * * * phedex source /home/phedex/PHEDEX/$PHEDEXVER/etc/profile.d/env.sh ; unset X509_USER_PROXY ; /usr/bin/voms-proxy-init ; /usr/bin/myproxy-get-delegation -s myproxy.cern.ch -k renewable -v -l psi_t3cmsvobox_phedex_joosep_2016 -a /home/phedex/gridcert/proxy.cert -o /home/phedex/gridcert/proxy.cert; export X509_USER_PROXY=/home/phedex/gridcert/proxy.cert; /usr/bin/voms-proxy-init -noregen -voms cms
# logrotate (the config file was generated by /home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/CreateLogrotConf.pl)
05 0 * * * phedex /usr/sbin/logrotate -s /home/phedex/state/logrotate.state /home/phedex/config/logrotate.conf
# log parsing and writing of results to the shared home area for consumption by the external web server
SUMMARYFILE=/shome/phedex/phedex-statistics.txt
*/15 * * * * phedex source /home/phedex/PHEDEX/$PHEDEXVER/etc/profile.d/env.sh; echo -e generated on `date` "\n------------------------" > $SUMMARYFILE; echo "Prod:" >> $SUMMARYFILE;/home/phedex/init.d/phedex_Prod status >> $SUMMARYFILE; echo "Debug:" >> $SUMMARYFILE;/home/phedex/init.d/phedex_Debug status >> $SUMMARYFILE; /home/phedex/PHEDEX/$PHEDEXVER/Utilities/InspectPhedexLog -c 300 -es "-12 hours" -d /home/phedex/log/Prod/download /home/phedex/log/Debug/download >> $SUMMARYFILE 2>/dev/null
/cvmfs
Read the
CVMFS page
Be aware of
https://twiki.cern.ch/twiki/bin/view/CMSPublic/CernVMFS4cms and the local
/cvmfs/cms.cern.ch
automatic mount point since
/cvmfs
is nowadays used by our
PhEDEx configurations :
[root@t3cmsvobox01 git]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 5.7G 4.3G 1.2G 79% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 477M 32M 420M 7% /boot
/dev/sda5 2.9G 640M 2.1G 24% /home
/dev/sdb1 20G 9.1G 11G 46% /opt/cvmfs_local <-- local /cvmfs cache
/dev/sda6 969M 1.7M 917M 1% /tmp
/dev/sda7 5.7G 874M 4.6G 16% /var
/dev/sdc1 9.9G 102M 9.3G 2% /var/cache/openafs
t3fs06:/shome 6.7T 5.0T 1.8T 75% /shome
t3fs05:/swshare 1.8T 562G 1.3T 31% /swshare
AFS 2.0T 0 2.0T 0% /afs
cvmfs2 14G 9.0G 4.7G 66% /cvmfs/cms.cern.ch
Because of
/cvmfs/cms.cern.ch/SITECONF/local/PhEDEx/storage.xml
that in turn is linked here :
# ll /home/phedex/config/COMP/SITECONF/T3_CH_PSI/PhEDEx/storage.xml
lrwxrwxrwx 1 phedex phedex 52 Apr 13 18:45 /home/phedex/config/COMP/SITECONF/T3_CH_PSI/PhEDEx/storage.xml -> /cvmfs/cms.cern.ch/SITECONF/local/PhEDEx/storage.xml
Pitfalls in dcache-srmclient-2.10.7-1 ( currently the latest dcache-srmclient )
Strangely
PhEDEx has a strong dependency on
dcache-srmclient
; by strong we mean that you can't use equivalent SRM tools like
lcg-cp
or
gfal-copy
; in its latest version, Fabio noticed that :
srmcp as in dcache-srmclient-2.2.4-2.el6.x86_64 had, by default, -delegate=true
srmcp as in dcache-srmclient-2.10.7-1.noarch has now, by default, -delegate=false
Paul Millar ( a primary dCache Dev ) commented in this way :
srmcp tries to avoid the wall-clock time and CPU overhead of delegation if that delegation isn't necessary.
Unfortunately, there is a bug: the copyjobfile ( used by PhEDEx ) option is not consulted when determining
whether third-party transfers are involved. The consequence is that all such transfers are considered
second-party and no delegation is done.
This bug badly affects
PhEDEx ; due to it a working
PhEDEx/dcache-srmclient-2.2.4-2
configuration will stop to work by simply migrating to
PhEDEx/dcache-srmclient-2.10.7-1.noarch
and you'll get ( cryptic ) errors like :
21 Apr 2015 07:11:13 (SRM-t3se01) [192.33.123.205:52205 VI8:439841:srm2:copy:-2098574001]
failed to connect to srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/mc/RunIIWinter15GS/RSGravToWW_kMpl01_M-2000_TuneCUETP8M1_13TeV-pythia8/GEN-SIM/MCRUN2_71_V1-v1/30000/AACEC97E-11B0-E411-9245-001E68862A32.root
credential remaining lifetime is less then a minute
Fabio fixed this by explicitly requesting
-delegate=true
to bypass the current
copyjob
bug :
[root@t3cmsvobox01 PhEDEx]# grep -Hn srmcp /home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/ConfigPart* | grep -v \#
/home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/ConfigPart.DebugServices:13: -command srmcp,-delegate=true,-pushmode=true,-debug=true,-retry_num=2,-protocols=gsiftp,-srm_protocol_version=2,-streams_num=1,-globus_tcp_port_range=20000:25000
/home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/ConfigPart.Standard:13: -command srmcp,-delegate=true,-pushmode=true,-debug=true,-retry_num=2,-protocols=gsiftp,-srm_protocol_version=2,-streams_num=1,-globus_tcp_port_range=20000:25000
Fabio noticed another bug again in
dcache-srmclient-2.10.7-1
where the default proxy location
/tmp/x509up_u`id -u`
is considered even if we explicitly specify the option
-x509_user_proxy
to use a different path :
Dear Paul and dCache colleagues, I believe I've found another bug in dcache-srmclient-2.10.7-1.noarch
$ srmls -debug=false -x509_user_proxy=/home/phedex/gridcert/proxy.cert -retry_num=0 'srm://t3se01.psi.ch:8443/srm/managerv2?SFN=/pnfs/psi.ch/cms/trivcat/store/mc/RunIIWinter15GS/RSGravToWWToLNQQ_kMpl01_M-4000_TuneCUETP8M1_13TeV-pythia8/GEN-SIM/MCRUN2_71_V1-v1/10000/2898A22B-62B0-E411-B1D4-002590D600EE.root'
srm client error:
java.lang.IllegalArgumentException: Multiple entries with same key:
x509_user_proxy=/home/phedex/gridcert/proxy.cert and
x509_user_proxy=/tmp/x509up_u205
Fabio fixed it by tweaking the following
PhEDEx scripts :
[root@t3cmsvobox01 PhEDEx]# grep -Hn export /home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/FileDownload* --color
/home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/FileDownloadDelete:14: export X509_USER_PROXY=/home/phedex/gridcert/proxy.cert && srmrm -retry_num=0 "$pfn";
/home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/FileDownloadSRMVerify:31: *managerv2* ) echo $(export X509_USER_PROXY=/home/phedex/gridcert/proxy.cert && srmls -debug=false -retry_num=0 "$path" 2>/dev/null| grep $file | cut -d\ -f3);;
/home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/FileDownloadSRMVerify:44: fields=($(export X509_USER_PROXY=/home/phedex/gridcert/proxy.cert && srmls -l -debug=false -retry_num=0 "$pfn" 2>/dev/null| grep Checksum))
/home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/FileDownloadSRMVerify:116: *managerv2*) export X509_USER_PROXY=/home/phedex/gridcert/proxy.cert && srmrm -retry_num=0 "$pfn";;
PhEDEx git
repo cloned as a reference
To observe the
PhEDEx code progresses keep updated the local git repo :
[root@t3cmsvobox01 git]# su - phedex
-bash-4.1$ cd git
-bash-4.1$ cd PHEDEX/
-bash-4.1$ git pull
remote: Counting objects: 14, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 14 (delta 2), reused 0 (delta 0), pack-reused 6
Unpacking objects: 100% (14/14), done.
From https://github.com/dmwm/PHEDEX
7768ae7..66c984f master -> origin/master
Updating 7768ae7..66c984f
Fast-forward
Contrib/subscription_info.py | 126 ++++++++++++++++++++++++++++++++++++++++++
Utilities/testSpace/testAuth | 27 +++++++++
2 files changed, 153 insertions(+), 0 deletions(-)
create mode 100755 Contrib/subscription_info.py
create mode 100644 Utilities/testSpace/testAuth
How to connect to the PhEDEx DBs
PhEDEx itself connects to the CERN Oracle DBs and you can directly inspect them by
sqlplus
; in another shell observe by
netstat -tp | grep sqlplus
your
sqlplus
connections and kill them by
killall sqlplus
if
sqlplus
will hang ; in real life you'll seldom need to connect by
sqlplus
but it's important to be aware about this option :
[root@t3cmsvobox01 phedex]# su - phedex
-bash-4.1$ source /home/phedex/PHEDEX/etc/profile.d/env.sh
-bash-4.1$ which sqlplus
~/sw/slc6_amd64_gcc461/external/oracle/11.2.0.3.0__10.2.0.4.0/bin/sqlplus
-bash-4.1$ sqlplus $(/home/phedex/PHEDEX/Utilities/OracleConnectId -db /home/phedex/config/DBParam.PSI:Prod/PSI)
SQL*Plus: Release 11.2.0.3.0 Production on Wed May 27 14:16:11 2015
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
SQL> select id,name from t_adm_node where name like '%CSCS%' or name like '%PSI%' ;
ID NAME
---------- --------------------
27 T2_CH_CSCS
821 T3_CH_PSI
SQL> select distinct r.id, r.created_by, r.time_create,r.comments reqcomid, rds.dataset_id, rds.name, rd.decided_by, rd.time_decided, rd.comments accomid from t_req_request r join t_req_type rt on rt.id = r.type join t_req_node rn on rn.request = r.id left join t_req_decision rd on rd.request = r.id and rd.node = rn.node join t_req_dataset rds on rds.request = r.id where rn.node = 821 and rt.name = 'xfer' and rd.decision = 'y' and dataset_id in (select distinct b.dataset from t_dps_block b join t_dps_block_replica br on b.id = br.block join t_dps_dataset d on d.id = b.dataset where node = 821 ) order by r.time_create desc ;
ID CREATED_BY TIME_CREATE REQCOMID DATASET_ID NAME DECIDED_BY TIME_DECIDED ACCOMID
---------- ---------- ----------- ---------- ---------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ------------ ----------
441651 786542 1429196738 303750 674704 /RSGravToWW_kMpl01_M-1800_TuneCUETP8M1_13TeV-pythia8/RunIIWinter15GS-MCRUN2_71_V1-v1/GEN-SIM 786664 1429287626 303779
441651 786542 1429196738 303750 674709 /RSGravToWW_kMpl01_M-2500_TuneCUETP8M1_13TeV-pythia8/RunIIWinter15GS-MCRUN2_71_V1-v1/GEN-SIM
...
Regular Maintenance work
Keeping updated CMS GIT Siteconf
If you modify the local
PhEDEx configurations then you have to publish these changes as described on
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SiteConfInGitlab
Nagios checks
https://t3nagios.psi.ch/nagios/cgi-bin/status.cgi?host=t3cmsvobox01
Checking the recent transfer errors
https://cmsweb.cern.ch/phedex/prod/Activity::ErrorInfo?tofilter=T3_CH_PSI&fromfilter=&report_code=.*&xfer_code=.*&to_pfn=.*&from_pfn=.*&log_detail=.*&log_validate=.*&.submit=Update#
Dataset cleaning
This task must be done regularly (once every 2 months, for example), both for CSCS and PSI.
Getting the datasets list
ssh root@t3cmsvobox.psi.ch
su - phedex
cd svn-sandbox/phedex/DB-query-tools/
source /home/phedex/PHEDEX/4.1.7/etc/profile.d/env.sh # <-- change that 4.1.7 if newer
./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep "eleted"
./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep -vE "Dutta|Fanfani|Kress|Magini|Wuerthwein|Belforte|Spinoso|Ajit|DataOps|eleted|StoreResults|Argiro|Klute|Cremonesi|Jean-Roch Vlimant|vocms[0-9]+|cmsgwms-submit[0-9]+|IntelROCCS|retention time: 2016|Retention date: 2016" <-- adapt that 2016
./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%PSI%" | grep -Ev "retention time: 2016|Retention date: 2016" <-- adapt that 2016
The
first PERL command creates a list of datasets that can be safely deleted from CSCS, as they are just support requests for transfers to PSI (check that the transfer happened safely).
The
second command creates a list avoiding to include central requests, and the ones that can be deleted from CSCS.
The
third command produces a list for PSI.
Datasets which are proposed for deletion are all the datasets which have an
expired retention time.
Publishing the list and notify users
Due date for feedback is usually in a week. Lists must be published in
DataSetCleaningQuery (previous lists must be deleted). To get the information on the total size proposed for deletion, you can create a temporary text file with pasted list from the twiki and then do:
cat tmp.list | awk 'BEGIN{sum=0}{sum+=$4}END{print sum/1024.}'
This will give the total size in TB.
A email like this must be sent to the
cms-tier3-users@lists.psi.ch
mailing list:
Subject: Dataset deletion proposal and request for User Data cleaning - Due date: 28 Oct 2011, 9:0
Dear all,
a new cleaning campaign is needed, both at CSCS and PSI. You can find the list and the instructions on how to request to keep the data here:
https://wiki.chipp.ch/twiki/bin/view/CmsTier3/DataSetCleaningQuery
The data contained in the lists amount to 47TB / 44TB for CSCS / PSI.
If you need to store a dataset both at CSCS and at PSI please also reply to this email explaining why.
Please remember to clean up your user folder at CSCS regularly; a usage overview can be found at [1] and [2]
Thanks,
Daniel
[1] http://ganglia.lcg.cscs.ch/ganglia/cms_sespace.txt
[2] http://ganglia.lcg.cscs.ch/ganglia/files_cms.html
Dataset cleaning - 2nd version
Derek also made this less cryptic ( you don't need to know the Oracle DBs tables and columns, and of course Perl ) Python tool :
[root@t3cmsvobox01 DB-query-tools]# ./ListSiteDataInfoWS.py --site T3_CH_PSI
Getting the data from the data service...
| *keep?*| *ID*| *Dataset*|*Size(GB)*| *Group*|*Requested on*|*Requested by*|*Comments*|*Comments2*|
| | 225527|/GluGluToHToWWTo2L2Nu_M-160_7TeV-powheg-pythia6/Winter10-E7TeV_ProbDist_2011Flat_BX156_START39_V8-v1/AODSIM|25.5| b-tagging|2011-02-18 13:35:49|Wolfram Erdmann|retention time April 2011|to be deleted from CSCS|
| | 269087|/BdToMuMu_2MuPtFilter_7TeV-pythia6-evtgen/Summer11-PU_S4_START42_V11-v1/GEN-SIM-RECO|58.6| b-physics|2011-06-08 12:34:25|Christoph Naegeli|retention-time: 2011-10-31| |
| | 320266|/RelValProdTTbar/SAM-MC_42_V12_SAM-v1/GEN-SIM-RECO|3.1| FacOps|2011-09-13 09:58:51|Andrea Sciaba| |Centrally approved (Nicolo)|
...
Renewing myproxy certificate in myproxy.cern.ch
(seldom, each ~11 months)
*Nagios daily checks the
voms proxy lifetime used by
PhEDEx; this proxy is either a Fabio CMS proxy or a Joosep CMS proxy and because of that all the
PhEDEx files uploaded in
/pnfs/psi.ch/cms/
belong to one of these 2 accounts ( but not chaotically to both ). If you change that proxy then you have to change ALL the related files/dirs ownership in
/pnfs/psi.ch/cms
; specifically you'll want to change the owner of
/pnfs/psi.ch/cms/trivcat/store/data
or conversely each
PhEDEx file transfer will fail with
permission denied
.
Following how to upload a long-life proxy into
myproxy.cern.ch
:
$ myproxy-init -t 168 -R 't3cmsvobox.psi.ch' -l psi_phedex_fabio -x -k renewable -s myproxy.cern.ch -c 8700
Your identity: /DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=users/C=CH/O=Paul-Scherrer-Institut (PSI)/CN=Fabio Martinelli
Enter GRID pass phrase for this identity:
Creating proxy .......................................................................................................................................... Done
Proxy Verify OK
Warning: your certificate and proxy will expire Thu Dec 10 01:00:00 2015
which is within the requested lifetime of the proxy
A proxy valid for 8700 hours (362.5 days) for user psi_phedex_fabio now exists on myproxy.cern.ch.
# That 362.5 days is wrong !
$ myproxy-info -s myproxy.cern.ch -l psi_phedex_fabio
username: psi_phedex_fabio
owner: /DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=users/C=CH/O=Paul-Scherrer-Institut (PSI)/CN=Fabio Martinelli
name: renewable
renewal policy: */CN=t3cmsvobox.psi.ch
timeleft: 6249:20:19 (260.4 days)
The present myproxy servers have problems with host certificates for PSI from SWITCH, because they contain a "(PSI)" substring, and the parentheses are not correctly escaped in the regexp matching of the myproxy code. Therefore, the renewer DN (-R argument to myproxy-init below) and the
allowed renewers policy on the myproxy server need to be defined with wildcards to enable the matching to succeed.
voms-proxy-init -voms cms
myproxyserver=myproxy.cern.ch
servicecert="/DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=hosts/C=CH/ST=Aargau/L=Villigen/O=Paul-Scherrer-Institut (PSI)/OU=AIT/CN=t3cmsvobox.psi.ch"
servicecert='*/CN=t3cmsvobox.psi.ch'
myproxy-init -s $myproxyserver -l psi_phedex -x -R "$servicecert" -c 720
scp ~/.x509up_u$(id -u) phedex@t3ui01:gridcert/proxy.cert
# for testing, you can try
myproxy-info -s $myproxyserver -l psi_phedex
As the phedex user do
chmod 600 ~/gridcert/proxy.cert
You should test whether the renewal of the certificate works for the phedex user: unset X509_USER_PROXY # make sure that the service credentials from ~/.globus are used!
voms-proxy-init # initializes the service proxy cert that is allowed to retrieve the user cert
myproxyserver=myproxy.cern.ch
myproxy-get-delegation -s $myproxyserver -v -l psi_phedex -a /home/phedex/gridcert/proxy.cert -o /tmp/gagatest
export X509_USER_PROXY=/tmp/gagatest
srm-get-metadata srm://t3se01.psi.ch:8443/srm/managerv1?SFN=/pnfs/psi.ch/cms
rm /tmp/gagatest
Storage Consistency Checks
From time to time the transfer team will ask for input for their
storage consistency check (so far only for T2); the last CSCS check was in
Feb 2014 ; to perform a 'Storage Consistency Check' we need to complete the following steps:
sed -e 's#/pnfs/lcg.cscs.ch/cms/trivcat/store/\(mc\|data\|generator\|results\|hidata\|himc\|lumi\|relval\)/#/store/\1/#' \
-e '/<entry name="\/pnfs\/lcg.cscs.ch\/cms\/.*<\/entry>/d' \
-e 's#<dCache:location>.*</dCache:location>##' \
outfile.xml | uniq > storagedump.xml
- compress, store on AFS, and send path to transfer team
- take the file you get back from the transfer team with the LFNs to be deleted
for LFN in $(cat SCC_Nov2012_CSCS_LFNsToBeRemoved.txt); do lcg-del -b -D srmv2 -l srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/$LFN; done
Emergency Measures
Services
/home/phedex/phedex_start.sh
To be manually invoked after a server restart !
/home/phedex/phedex_stop.sh
/home/phedex/phedex_status.sh
current phedex status
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Transfer/FileDownload -state /home/phedex/state/Debug/incoming/download/ -log /home/phedex/log/Debug/download -verbose -db/home/phed
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Transfer/FileExport -state /home/phedex/state/Debug/incoming/fileexport/ -log /home/phedex/log/Debug/fileexport -db/home/phedex/conf
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Transfer/FileRemove -state /home/phedex/state/Debug/incoming/fileremove/ -log /home/phedex/log/Debug/fileremove -node T3_CH_PSI -db/
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Verify/BlockDownloadVerify -state /home/phedex/state/Debug/incoming/blockverify/ -log /home/phedex/log/Debug/blockverify --db/home/p
perl /home/phedex/PHEDEX/4.1.7/Utilities/AgentFactory.pl -state /home/phedex/state/Debug/incoming/watchdog/ -log /home/phedex/log/Debug/watchdog -db/home/phedex/config/DBP
perl /home/phedex/PHEDEX/4.1.7/Utilities/AgentFactoryLite.pl -state /home/phedex/state/Debug/incoming/WatchdogLite/ -log /home/phedex/log/Debug/WatchdogLite -nodeT3_CH_PSI
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Transfer/FileDownload -state /home/phedex/state/Dev/incoming/download/ -log /home/phedex/log/Dev/download -verbose -db/home/phedex/c
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Transfer/FileExport -state /home/phedex/state/Dev/incoming/fileexport/ -log /home/phedex/log/Dev/fileexport -db/home/phedex/config/D
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Transfer/FileRemove -state /home/phedex/state/Dev/incoming/fileremove/ -log /home/phedex/log/Dev/fileremove -node T3_CH_PSI -db/home
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Verify/BlockDownloadVerify -state /home/phedex/state/Dev/incoming/blockverify/ -log /home/phedex/log/Dev/blockverify --db/home/phede
perl /home/phedex/PHEDEX/4.1.7/Utilities/AgentFactory.pl -state /home/phedex/state/Dev/incoming/watchdog/ -log /home/phedex/log/Dev/watchdog -db/home/phedex/config/DBParam
perl /home/phedex/PHEDEX/4.1.7/Utilities/AgentFactoryLite.pl -state /home/phedex/state/Dev/incoming/WatchdogLite/ -log /home/phedex/log/Dev/WatchdogLite -node T3_CH_PSI-ag
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Transfer/FileDownload -state /home/phedex/state/Prod/incoming/download/ -log /home/phedex/log/Prod/download -verbose -db/home/phedex
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Transfer/FileExport -state /home/phedex/state/Prod/incoming/fileexport/ -log /home/phedex/log/Prod/fileexport -db/home/phedex/config
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Transfer/FileRemove -state /home/phedex/state/Prod/incoming/fileremove/ -log /home/phedex/log/Prod/fileremove -node T3_CH_PSI -db/ho
perl /home/phedex/PHEDEX/4.1.7/Toolkit/Verify/BlockDownloadVerify -state /home/phedex/state/Prod/incoming/blockverify/ -log /home/phedex/log/Prod/blockverify --db/home/phe
perl /home/phedex/PHEDEX/4.1.7/Utilities/AgentFactory.pl -state /home/phedex/state/Prod/incoming/watchdog/ -log /home/phedex/log/Prod/watchdog -db/home/phedex/config/DBPar
perl /home/phedex/PHEDEX/4.1.7/Utilities/AgentFactoryLite.pl -state /home/phedex/state/Prod/incoming/WatchdogLite/ -log /home/phedex/log/Prod/WatchdogLite -node T3_CH_PSI-
bash
└─bash
└─pstree -uh phedex -la
TRANSFER LOGs
=====================================================================
source /home/phedex/PHEDEX/4.1.7/etc/profile.d/env.sh
/home/phedex/PHEDEX/4.1.7/Utilities/InspectPhedexLog -es "-1 hours" -d /home/phedex/log/Prod/download /home/phedex/log/Debug/download
/home/phedex/PHEDEX/4.1.7/Utilities/InspectPhedexLog -es "-1 days" -d /home/phedex/log/Prod/download /home/phedex/log/Debug/download
How to update
=============
https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhedexAdminDocsInstallation#Updating_Software
How to test /home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/storage.xml
=====================================================================
source /home/phedex/PHEDEX/4.1.7/etc/profile.d/env.sh
/home/phedex/PHEDEX/4.1.7/Utilities/TestCatalogue -c /home/phedex/config/SITECONF/T3_CH_PSI/PhEDEx/storage.xml -p srmv2 -L /store/data/file
StorageConsistencyCheck example
===============================
source /home/phedex/PHEDEX/4.1.7/etc/profile.d/env.sh
/home/phedex/PHEDEX/4.1.7/Utilities/StorageConsistencyCheck -db /home/phedex/config/DBParam.PSI:Prod/PSI -lfnlist /home/phedex/PSI.lfnlist.txt -node T3_CH_PSI
/home/phedex/PHEDEX/4.1.7/Utilities/StorageConsistencyCheck -db /home/phedex/config/DBParam.PSI:Prod/PSI -lfnlist /home/phedex/CSCS.lfnlist.txt -node T2_CH_CSCS
[root@t3dcachedb03 ~]# psql -U nagios -d chimera -c " select path from v_pnfs where path like '/pnfs/psi.ch/cms%' ; " -t -q -o ./PSI.txt <-------- to get the LFN
PSI agents as perceived from the CERN DBs
=========================================
source /home/phedex/PHEDEX/4.1.7/etc/profile.d/env.sh
/home/phedex/PHEDEX/4.1.7/Utilities/ShowAgents -db /home/phedex/config/DBParam.PSI:Prod/PSI -node T3_CH_PSI
https://cmsweb.cern.ch/phedex/datasvc/xml/prod/agents?node=T3_CH_PSI
https://cmsweb.cern.ch/phedex/datasvc/json/prod/agents?node=T3_CH_PSI
PSI Prod Datasets
=========================================
https://cmsweb.cern.ch/phedex/prod/Data::Replicas?view=global&rcolumn=Name&nvalue=Node+files&rows=interesting&dbs=21&dbs=1&dbs=41&node=821&filter=.*
PSI Prod Errors
=========================================
https://cmsweb.cern.ch/phedex/prod/Activity::ErrorInfo?from_pfn=.%2A;fromfilter=.%2A;log_detail=.%2A;log_validate=.%2A;xfer_code=.%2A;tofilter=T3_CH_PSI;to_pfn=.%2A;report_code=.%2A
https://cmsweb.cern.ch/phedex/datasvc/xml/prod/errorlog?to=T3_CH_PSI
HOW TO IDENTIFY THE DATASETS TO BE DELETED AT PSI AND CSCS
==========================================================
https://wiki.chipp.ch/twiki/bin/view/CmsTier3/CmsVoBox#Dataset_cleaning <-- HowTo Doc
source /home/phedex/PHEDEX/4.1.7/etc/profile.d/env.sh
cd svn-sandbox/phedex/DB-query-tools/
source /home/phedex/PHEDEX/4.1.7/etc/profile.d/env.sh
./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep "eleted"
echo
./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep -vE "Dutta|Fanfani|Kress|Magini|Wuerthwein|Belforte|Spinoso|Ajit|DataOps|eleted|StoreResults|Argiro|Klute|Cremonesi|Jean-Roch Vlimant|vocms[0-9]+|cmsgwms-submit[0-9]+|IntelROCCS|retention time: 2016|Retention date: 2016"
echo
./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%PSI%" | grep -Ev "retention time: 2016|Retention date: 2016"
https://wiki.chipp.ch/twiki/bin/view/CmsTier3/DataSetCleaningQuery <-- where to publish the ListSiteDataInfo.pl outputs
be aware that if a data sets appeared, disappeared and appeared again ListSiteDataInfo.pl as it's written today will report the 1st old occurrence, so old user and old retention time
</>
netstat -tp
More... Close
[root@t3cmsvobox01 git]# netstat -tp
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 t3cmsvobox01.psi.ch:5666 t3nagios.psi.ch:42759 TIME_WAIT -
tcp 0 0 t3cmsvobox01.psi.ch:44718 t3service01.p:fujitsu-dtcns ESTABLISHED 1154/syslog-ng
tcp 0 0 t3cmsvobox01.psi.ch:33474 t3ldap01.psi.ch:ldaps ESTABLISHED 21373/nslcd
tcp 0 0 t3cmsvobox01.psi.ch:36140 itrac50078-v.cern.ch:10121 TIME_WAIT -
tcp 0 0 t3cmsvobox01.psi.ch:49704 t3dcachedb03.psi.ch:rootd ESTABLISHED 2084/xrdcp
tcp 0 0 t3cmsvobox01.psi.ch:49657 t3dcachedb03.psi.ch:rootd ESTABLISHED 1810/xrdcp
tcp 0 0 t3cmsvobox01.psi.ch:vsinet t3fs06.psi.ch:nfs ESTABLISHED -
tcp 0 0 t3cmsvobox01.psi.ch:36185 itrac50078-v.cern.ch:10121 ESTABLISHED 30705/perl
tcp 0 0 t3cmsvobox01.psi.ch:35926 itrac50078-v.cern.ch:10121 ESTABLISHED 31158/perl
tcp 0 0 t3cmsvobox01.psi.ch:36344 itrac50078-v.cern.ch:10121 ESTABLISHED 29777/perl
tcp 0 0 t3cmsvobox01.psi.ch:39625 t3se01.psi.ch:pcsync-https TIME_WAIT -
tcp 1 0 t3cmsvobox01.psi.ch:57092 t3frontier01.psi.ch:squid CLOSE_WAIT 29416/cvmfs2
tcp 0 0 t3cmsvobox01.psi.ch:36198 itrac50078-v.cern.ch:10121 ESTABLISHED 30198/perl
tcp 0 0 t3cmsvobox01.psi.ch:36191 itrac50078-v.cern.ch:10121 ESTABLISHED 29695/perl
tcp 1 0 t3cmsvobox01.psi.ch:5666 t3nagios.psi.ch:42436 CLOSE_WAIT 2057/nrpe
tcp 0 0 t3cmsvobox01.psi.ch:44651 topbdii04.cern.ch:eyetv TIME_WAIT -
tcp 0 0 t3cmsvobox01.psi.ch:40025 t3admin01.psi.ch:4505 ESTABLISHED 21525/python2.6
tcp 0 0 t3cmsvobox01.psi.ch:36196 itrac50078-v.cern.ch:10121 ESTABLISHED 29898/perl
tcp 0 0 t3cmsvobox01.psi.ch:57901 topbdii03.cern.ch:eyetv TIME_WAIT -
tcp 1 0 t3cmsvobox01.psi.ch:56951 t3frontier01.psi.ch:squid CLOSE_WAIT 29416/cvmfs2
tcp 0 0 t3cmsvobox01.psi.ch:33462 t3ldap01.psi.ch:ldaps ESTABLISHED 21373/nslcd
tcp 1 0 t3cmsvobox01.psi.ch:5666 t3nagios.psi.ch:41657 CLOSE_WAIT 1783/nrpe
tcp 0 0 t3cmsvobox01.psi.ch:ssh t3admin01.psi.ch:56676 ESTABLISHED 1571/sshd
tcp 0 0 t3cmsvobox01.psi.ch:33448 t3ldap01.psi.ch:ldaps ESTABLISHED 21373/nslcd
tcp 0 0 t3cmsvobox01.psi.ch:36194 itrac50078-v.cern.ch:10121 ESTABLISHED 30280/perl
tcp 0 0 t3cmsvobox01.psi.ch:36075 itrac50078-v.cern.ch:10121 ESTABLISHED 30401/perl
tcp 0 0 t3cmsvobox01.psi.ch:54268 topbdii05.cern.ch:eyetv TIME_WAIT -
tcp 0 0 t3cmsvobox01.psi.ch:36182 itrac50078-v.cern.ch:10121 ESTABLISHED 30992/perl
tcp 0 0 t3cmsvobox01.psi.ch:36098 itrac50078-v.cern.ch:10121 ESTABLISHED 30540/perl
tcp 0 0 t3cmsvobox01.psi.ch:33453 t3ldap01.psi.ch:ldaps ESTABLISHED 21373/nslcd
tcp 0 0 t3cmsvobox01.psi.ch:33461 t3ldap01.psi.ch:ldaps ESTABLISHED 21373/nslcd
tcp 0 0 t3cmsvobox01.psi.ch:56928 topbdii02.cern.ch:eyetv TIME_WAIT -
tcp 0 0 t3cmsvobox01.psi.ch:39623 t3se01.psi.ch:pcsync-https TIME_WAIT -
tcp 1 0 t3cmsvobox01.psi.ch:56952 t3frontier01.psi.ch:squid CLOSE_WAIT 29416/cvmfs2
tcp 0 0 t3cmsvobox01.psi.ch:56555 t3fs13.psi.ch:gsiftp TIME_WAIT -
tcp 0 0 t3cmsvobox01.psi.ch:52096 t3admin01.psi.ch:4506 ESTABLISHED 21525/python2.6
Checking each CMS pool by Nagios through both the t3se01:SRM
and t3dcachedb:Xrootd
dCache doors
By
t3cmsvobox
, in turn contacted by
t3nagios
, we retrieve a file from each CMS pool through both
t3se01:SRM
and
t3dcachedb:Xrootd
https://t3nagios.psi.ch/check_mk/index.py?start_url=%2Fcheck_mk%2Fview.py%3Fview_name%3Dhost%26host%3Dt3cmsvobox01%26site%3D
In both the cases the test files retrieved are :
[martinelli_f@t3ui12 ~]$ find /pnfs/psi.ch/cms/t3-nagios/ | grep M | sort
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs01_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs02_cms
...
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_9
The related dCache files have to be obviously placed on the right CMS pool otherwise the Nagios tests will be wrong ! To easily check where they are really placed run this SQL code ( in this example some test files are
erroneously available in the wrong pool ! that was due to a bad
migration cache
command )
More... Close
[root@t3dcachedb03 ~]# psql -U nagios -d chimera -c " select path,ipnfsid,pools from v_pnfs where path like '%1MB-test-file_pool_%' ; "
path | ipnfsid | pools
-------------------------------------------------------------+--------------------------------------+------------------------------------
/pnfs/psi.ch/dteam/t3-nagios/1MB-test-file_pool_t3fs09_ops | 0000BCDA4B329DA94D64AAAFE7C0C7501E5C | t3fs09_ops
/pnfs/psi.ch/dteam/t3-nagios/1MB-test-file_pool_t3fs08_ops | 0000358B14867ED5402184C2C22F81EFC861 | t3fs08_ops
/pnfs/psi.ch/dteam/t3-nagios/1MB-test-file_pool_t3fs07_ops | 0000409BB804C95944A38DBE8220B416A8A3 | t3fs07_ops
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_9 | 0000B58A7FA17778439F8F6F47C5CBBED5E7 | t3fs03_cms t3fs11_cms t3fs14_cms_9
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_8 | 00001A2FD52D31DB4CCAB99C8B8336522339 | t3fs09_cms t3fs11_cms t3fs14_cms_8
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_7 | 000018AA61C1E30F43709F0D9FE3B9CD65D1 | t3fs03_cms t3fs14_cms_7
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_6 | 0000E88C6CBB2D5A4365B11BE2EDD1554366 | t3fs02_cms t3fs14_cms_6
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_5 | 000200000000000006300738 | t3fs10_cms t3fs14_cms_5
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_4 | 0002000000000000052EF198 | t3fs03_cms t3fs14_cms_4
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_3 | 0002000000000000052EF168 | t3fs03_cms t3fs14_cms_3
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_2 | 0002000000000000052EF138 | t3fs07_cms t3fs14_cms_2
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_11 | 00003616229002194F439925DA3C7F1CFA02 | t3fs10_cms t3fs14_cms_11
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_10 | 0000B3D6A96EF961473AACB05F80CF9D6892 | t3fs07_cms t3fs14_cms_10
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_1 | 0002000000000000052EF108 | t3fs02_cms t3fs11_cms t3fs14_cms_1
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_0 | 0000A6470E0458354BD99D6C2DD27B196DCC | t3fs08_cms t3fs14_cms_0
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms | 0002000000000000052EF0D8 | t3fs03_cms t3fs04_cms t3fs14_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_9 | 00004783F9158A5941B284342FF4A8EDE126 | t3fs08_cms t3fs13_cms_9
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_8 | 0000132841305C27434891574015FD2CF923 | t3fs09_cms t3fs13_cms_8
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_7 | 00003FC27733ACBA4A809677419256FE22F9 | t3fs02_cms t3fs11_cms t3fs13_cms_7
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_6 | 0002000000000000072F8630 | t3fs07_cms t3fs11_cms t3fs13_cms_6
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_5 | 0002000000000000052EF0A8 | t3fs03_cms t3fs13_cms_5
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_4 | 0002000000000000052EF078 | t3fs10_cms t3fs11_cms t3fs13_cms_4
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_3 | 0002000000000000052EF048 | t3fs10_cms t3fs13_cms_3
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_2 | 0002000000000000052EF018 | t3fs02_cms t3fs13_cms_2
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_11 | 00000DB49D5B69EB4C568834BD162C3DA8E7 | t3fs09_cms t3fs13_cms_11
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_10 | 0000073FF4F754BB4AB1B4599F412811BDA2 | t3fs10_cms t3fs13_cms_10
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_1 | 00000CB9E97140F940CD973C319045B43FDA | t3fs04_cms t3fs11_cms t3fs13_cms_1
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_0 | 00005560491A76DE49DBA142D3BE3CFE38D5 | t3fs02_cms t3fs11_cms t3fs13_cms_0
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms | 0002000000000000052EEFB8 | t3fs07_cms t3fs11_cms t3fs13_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs11_cms | 00009E4A9774085C4799B5C9C827DA03406F | t3fs11_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs10_cms | 000005D1DD24CA14448694E5C46A8AA8E91F | t3fs10_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs09_cms | 0000479ED8FDDC374BC68827AEDF1C146686 | t3fs09_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs08_cms | 00003A989AB6D1074D738594B1D01E2D03DE | t3fs08_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs07_cms | 0000119DDCFD0C5F42B89769BC9C104A997F | t3fs07_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs04_cms_1 | 0002000000000000063D8C68 | t3fs04_cms_1
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs04_cms | 00020000000000000395B300 | t3fs04_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs03_cms | 000200000000000006391F88 | t3fs03_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs02_cms | 00020000000000000330BF10 | t3fs02_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs01_cms | 00020000000000000330BF90 | t3fs01_cms
Backups
OS snapshots are nightly taken by the PSI VMWare Team ( contact Peter Huesser ) + we have
LinuxBackupsByLegato to recover a single file.