Node Type: CmsVoBox

Firewall requirements

local port open to reason


Installation

Installation is described by the Puppet files tier3-baseclasses.pp and SL6_vobox.pp both saved in the dir puppetdirenodes, where puppetdirenodes is an alias defined in the following list:

alias kscustom57='cd /afs/psi.ch/software/linux/dist/scientific/57/custom'
alias kscustom64='cd /afs/psi.ch/software/linux/dist/scientific/64/custom'
alias ksdir='cd /afs/psi.ch/software/linux/kickstart/configs'
alias puppetdir='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/'
alias puppetdirnodes='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/nodes'
alias puppetdirredhat='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/RedHat'
alias puppetdirsolaris='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/Solaris/5.10'
alias yumdir5='cd /afs/psi.ch/software/linux/dist/scientific/57/scripts'
alias yumdir6='cd /afs/psi.ch/software/linux/dist/scientific/6/scripts'

Regular Maintenance work

Note that there are many checks performed by t3nagios

How to connect to the PhEDEx DBs

If you want to directly inspect the PhEDEx DBs you might use sqlplus ; in another shell observe by netstat -tp | grep sqlplus your sqlplus connections and kill them by killall sqlplus if sqlplus will hang :

[root@t3cmsvobox01 phedex]# su - phedex
-bash-4.1$ source /home/phedex/PHEDEX/etc/profile.d/env.sh
-bash-4.1$ which sqlplus
~/sw/slc6_amd64_gcc461/external/oracle/11.2.0.3.0__10.2.0.4.0/bin/sqlplus
-bash-4.1$ sqlplus $(/home/phedex/PHEDEX/Utilities/OracleConnectId -db /home/phedex/config/DBParam.PSI:Prod/PSI)

SQL*Plus: Release 11.2.0.3.0 Production on Wed May 27 14:16:11 2015

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> select id,name from t_adm_node where name like '%CSCS%' or name like '%PSI%' ; 

   ID NAME
---------- --------------------
   27 T2_CH_CSCS
       821 T3_CH_PSI

SQL> select distinct r.id, r.created_by, r.time_create,r.comments reqcomid, rds.dataset_id, rds.name, rd.decided_by, rd.time_decided, rd.comments accomid  from t_req_request r join t_req_type rt on rt.id = r.type join t_req_node rn on rn.request = r.id left join t_req_decision rd on rd.request = r.id and rd.node = rn.node join t_req_dataset rds on rds.request = r.id where rn.node = 821 and rt.name = 'xfer' and rd.decision = 'y' and dataset_id in (select distinct b.dataset  from t_dps_block b join t_dps_block_replica br on b.id = br.block join t_dps_dataset d on d.id = b.dataset where node = 821 ) order by r.time_create desc ; 

	ID CREATED_BY TIME_CREATE   REQCOMID DATASET_ID NAME																											 DECIDED_BY TIME_DECIDED    ACCOMID
---------- ---------- ----------- ---------- ---------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ------------ ----------
    441651     786542  1429196738     303750	 674696 /RSGravToWW_kMpl01_M-800_TuneCUETP8M1_13TeV-pythia8/RunIIWinter15GS-MCRUN2_71_V1-v1/GEN-SIM																     786664   1429287626     303779
    441651     786542  1429196738     303750	 674704 /RSGravToWW_kMpl01_M-1800_TuneCUETP8M1_13TeV-pythia8/RunIIWinter15GS-MCRUN2_71_V1-v1/GEN-SIM																     786664   1429287626     303779
    441651     786542  1429196738     303750	 674709 /RSGravToWW_kMpl01_M-2500_TuneCUETP8M1_13TeV-pythia8/RunIIWinter15GS-MCRUN2_71_V1-v1/GEN-SIM
...

Dataset cleaning

This task must be done regularly (once every 2 months, for example), both for CSCS and PSI.

Getting the datasets list

Connect to t3cmsvobox as root and:

 su - phedex
 cd svn-sandbox/phedex/DB-query-tools/
 source /home/phedex/PHEDEX/etc/profile.d/env.sh
  ./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep  "eleted"
 ./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep -vE "Dutta|Fanfani|Kress|Magini|Wuerthwein|Belforte|Spinoso|Ajit|DataOps|eleted|StoreResults|Argiro|Klute"
 ./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%PSI%"

The first PERL command creates a list of datasets that can be safely deleted from CSCS, as they are just support requests for transfers to PSI (check that the transfer happened safely).
The second command creates a list avoiding to include central requests, and the ones that can be deleted from CSCS.
The third command produces a list for PSI.

Datasets which are proposed for deletion are all the datasets which have an expired retention time.

Publishing the list and notify users

Due date for feedback is usually in a week. Lists must be published in DataSetCleaningQuery (previous lists must be deleted). To get the information on the total size proposed for deletion, you can create a temporary text file with pasted list from the twiki and then do:

cat tmp.list | awk 'BEGIN{sum=0}{sum+=$4}END{print sum/1024.}'

This will give the total size in TB.

A email like this must be sent to the cms-tier3-users@lists.psi.ch mailing list:

Subject: Dataset deletion proposal and request for User Data cleaning - Due date: 28 Oct 2011, 9:0
 
Dear all,
a new cleaning campaign is needed, both at CSCS and PSI. You can find the list and the instructions on how to request to keep the data here:
https://twiki.cscs.ch/twiki/bin/view/CmsTier3/DataSetCleaningQuery
The data contained in the lists amount to 47TB / 44TB for CSCS / PSI.
If you need to store a dataset both at CSCS and at PSI please also reply to this email explaining why.
Please remember to clean up your user folder at CSCS regularly; a usage overview can be found at [1] and [2]

Thanks, 
Daniel

[1] http://ganglia.lcg.cscs.ch/ganglia/cms_sespace.txt
[2] http://ganglia.lcg.cscs.ch/ganglia/files_cms.html 

Renew myproxy certificate for PhEDEx transfers (once each 11 months)

*Nagios daily checks the voms proxy lifetime used by PhEDEx; this proxy is a Fabio CMS proxy and because of that all the PhEDEx files uploaded in /pnfs/psi.ch/cms/ belong to his account. If you change that proxy then you have to change the related files/dirs ownership in /pnfs/psi.ch/cms ; specifically you'll want to change the owner of /pnfs/psi.ch/cms/trivcat/store/data , conversely you will get a lot of permission denied.

Following how to upload a long-life proxy into myproxy.cern.ch :

$ myproxy-init -t 168 -R 't3cmsvobox.psi.ch' -l psi_phedex_fabio -x -k renewable -s myproxy.cern.ch -c 8700
Your identity: /DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=users/C=CH/O=Paul-Scherrer-Institut (PSI)/CN=Fabio Martinelli
Enter GRID pass phrase for this identity:
Creating proxy .......................................................................................................................................... Done
Proxy Verify OK

Warning: your certificate and proxy will expire Thu Dec 10 01:00:00 2015
 which is within the requested lifetime of the proxy
A proxy valid for 8700 hours (362.5 days) for user psi_phedex_fabio now exists on myproxy.cern.ch.

# That 362.5 days is wrong !

$ myproxy-info -s myproxy.cern.ch  -l psi_phedex_fabio
username: psi_phedex_fabio
owner: /DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=users/C=CH/O=Paul-Scherrer-Institut (PSI)/CN=Fabio Martinelli
  name: renewable
  renewal policy: */CN=t3cmsvobox.psi.ch
  timeleft: 6249:20:19  (260.4 days)

The present myproxy servers have problems with host certificates for PSI from SWITCH, because they contain a "(PSI)" substring, and the parentheses are not correctly escaped in the regexp matching of the myproxy code. Therefore, the renewer DN (-R argument to myproxy-init below) and the allowed renewers policy on the myproxy server need to be defined with wildcards to enable the matching to succeed.

 
voms-proxy-init -voms cms
myproxyserver=myproxy.cern.ch
servicecert="/DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=hosts/C=CH/ST=Aargau/L=Villigen/O=Paul-Scherrer-Institut (PSI)/OU=AIT/CN=t3cmsvobox.psi.ch"
servicecert='*/CN=t3cmsvobox.psi.ch'
myproxy-init -s $myproxyserver -l psi_phedex -x       -R "$servicecert" -c 720
scp ~/.x509up_u$(id -u) phedex@t3ui01:gridcert/proxy.cert
#  for testing, you can try
myproxy-info -s $myproxyserver -l psi_phedex

As the phedex user do

chmod 600 ~/gridcert/proxy.cert

You should test whether the renewal of the certificate works for the phedex user: unset X509_USER_PROXY # make sure that the service credentials from ~/.globus are used!

voms-proxy-init  # initializes the service proxy cert that is allowed to retrieve the user cert
myproxyserver=myproxy.cern.ch
myproxy-get-delegation -s $myproxyserver -v -l psi_phedex              -a /home/phedex/gridcert/proxy.cert -o /tmp/gagatest

export X509_USER_PROXY=/tmp/gagatest
srm-get-metadata srm://t3se01.psi.ch:8443/srm/managerv1?SFN=/pnfs/psi.ch/cms
rm /tmp/gagatest

Storage Consistency Checks

From time to time the transfer team will ask for input for their storage consistency check (so far only for T2); the last CSCS check was in Feb 2014 ; to perform a 'Storage Consistency Check' we need to complete the following steps:

  • make sure PhEDEx is updated to the latest version and its config is committed in GIT
  • ask CSCS admins for a storage dump
    python chimera-dump.py -s /pnfs/lcg.cscs.ch/cms -c fulldump -g -o /tmp/outfile
  • convert the file using:
sed -e 's#/pnfs/lcg.cscs.ch/cms/trivcat/store/\(mc\|data\|generator\|results\|hidata\|himc\|lumi\|relval\)/#/store/\1/#' \ 
    -e '/<entry name="\/pnfs\/lcg.cscs.ch\/cms\/.*<\/entry>/d' \ 
    -e 's#<dCache:location>.*</dCache:location>##' \ 
  outfile.xml | uniq > storagedump.xml
  • compress, store on AFS, and send path to transfer team
  • take the file you get back from the transfer team with the LFNs to be deleted
for LFN in $(cat SCC_Nov2012_CSCS_LFNsToBeRemoved.txt); do lcg-del -b -D srmv2 -l srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/$LFN; done

Emergency Measures

Installation

Look the Puppet file /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/nodes/SL6_vobox.pp

add the following package to run our custom "accounting"-scripts:

yum install perl-XML-Twig 

Services

PhEDEx

Refer to the description on the Tier-2 VOBox.

There is one important difference: while we use FTS channels for the transfers to the Tier-2 we use the SRM backend for transfers to the Tier-3, because we do not have a FTS channel for PSI. This issue is linked to registering PSI as a regular grid site, which until recently was not possible, since we only support a Grid SE, but no a CE.

Thus there is no fts.map file in the configuration area for the PhEDEx services.

Nagios checking each T3 pool by both t3se01 SRM and t3dcachedb Xrootd protocols

By t3cmsvobox we check each T3 pool twice, by t3se01 SRM and by t3dcachedb Xrootd ; look https://t3nagios.psi.ch/nagios/cgi-bin/status.cgi?navbarsearch=1&host=t3cmsvobox

In both the cases the file names retrieved by the tests are :

[martinelli_f@t3ui02 ~]$ find /pnfs/psi.ch/cms/t3-nagios/ | grep M | sort
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs01_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs02_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs03_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs04_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs04_cms_1
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs07_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs08_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs09_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs10_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs11_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_0
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_1
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_10
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_11
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_2
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_3
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_4
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_5
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_6
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_7
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_8
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_9
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_0
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_1
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_10
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_11
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_2
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_3
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_4
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_5
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_6
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_7
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_8
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_9

The related dCache files have to be placed in the right pool or the Nagios tests will be obviously wrong! You can easily check on which pool are stored the dCache files by :

$ find /pnfs/psi.ch/cms/t3-nagios/ | grep M | sort | dc_get_ID_from_pnfsnamelist.sh | dc_get_cacheinfo_from_IDlist.sh | xargs -iG echo [\'G\'],| sed s/' '/\',\'/| xargs -iG echo G \\

[00020000000000000330BF90,t3fs01_cms], 
[00020000000000000330BF10,t3fs02_cms], 
[000200000000000006391F88,t3fs03_cms], 
[00020000000000000395B300,t3fs04_cms], 
[0002000000000000063D8C68,t3fs04_cms_1], 
[0000119DDCFD0C5F42B89769BC9C104A997F,t3fs07_cms], 
[00003A989AB6D1074D738594B1D01E2D03DE,t3fs08_cms], 
[0000479ED8FDDC374BC68827AEDF1C146686,t3fs09_cms], 
[000005D1DD24CA14448694E5C46A8AA8E91F,t3fs10_cms], 
[00009E4A9774085C4799B5C9C827DA03406F,t3fs11_cms], 
[0002000000000000052EEFB8,t3fs13_cms], 
[00005560491A76DE49DBA142D3BE3CFE38D5,t3fs13_cms_0], 
[00000CB9E97140F940CD973C319045B43FDA,t3fs13_cms_1], 
[0000073FF4F754BB4AB1B4599F412811BDA2,t3fs13_cms_10], 
[00000DB49D5B69EB4C568834BD162C3DA8E7,t3fs13_cms_11], 
[0002000000000000052EF018,t3fs13_cms_2], 
[0002000000000000052EF048,t3fs13_cms_3], 
[0002000000000000052EF078,t3fs13_cms_4], 
[0002000000000000052EF0A8,t3fs13_cms_5], 
[0002000000000000072F8630,t3fs13_cms_6], 
[00003FC27733ACBA4A809677419256FE22F9,t3fs13_cms_7], 
[0000132841305C27434891574015FD2CF923,t3fs13_cms_8], 
[00004783F9158A5941B284342FF4A8EDE126,t3fs13_cms_9], 
[0002000000000000052EF0D8,t3fs14_cms], 
[0000A6470E0458354BD99D6C2DD27B196DCC,t3fs14_cms_0], 
[0002000000000000052EF108,t3fs14_cms_1], 
[0000B3D6A96EF961473AACB05F80CF9D6892,t3fs14_cms_10], 
[00003616229002194F439925DA3C7F1CFA02,t3fs14_cms_11], 
[0002000000000000052EF138,t3fs14_cms_2], 
[0002000000000000052EF168,t3fs14_cms_3], 
[0002000000000000052EF198,t3fs14_cms_4], 
[000200000000000006300738,t3fs14_cms_5], 
[0000E88C6CBB2D5A4365B11BE2EDD1554366,t3fs14_cms_6], 
[000018AA61C1E30F43709F0D9FE3B9CD65D1,t3fs14_cms_7], 
[00001A2FD52D31DB4CCAB99C8B8336522339,t3fs14_cms_8], 
[0000B58A7FA17778439F8F6F47C5CBBED5E7,t3fs14_cms_9], 

Backups

OS snapshots are nightly taken by the PSI VMWare Team ( contact Peter Huesser ) + we have LinuxBackupsByLegato to recover a single file.

NodeTypeForm
Hostnames t3cmsvobox ( t3cmsvobox01 )
Services PhEDEx 4.1.3
Hardware PSI DMZ VMWare cluster
Install Profile vobox
Guarantee/maintenance until VMWare PSI Cluster
Edit | Attach | Watch | Print version | History: r50 | r32 < r31 < r30 < r29 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r30 - 2015-05-27 - FabioMartinelli
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback