Firewall requirements
Regular Maintenance work
Note that there are many checks performed by t3nagios
Dataset cleaning
This task must be done regularly (once every 2 months, for example), both for CSCS and PSI.
Getting the datasets list
Connect to t3cmsvobox as root and:
su - phedex
cd svn-sandbox/phedex/DB-query-tools/
./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep "eleted"
./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep -vE "Dutta|Fanfani|Kress|Magini|Wuerthwein|Belforte|Spinoso|Ajit|DataOps|eleted|StoreResults|Argiro|Klute"
./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%PSI%"
The
first PERL command creates a list of datasets that can be safely deleted from CSCS, as they are
just support requests for transfers to PSI (check that the transfer happened safely).
The
second command creates a list avoiding to include central requests, and the ones that can be deleted from CSCS.
The
third command produces a list for PSI.
Datasets which are proposed for deletion are all the datasets which have an
expired retention time.
Publishing the list and notify users
Due date for feedback is usually in a week. Lists must be published in
DataSetCleaningQuery (previous lists must be deleted).
To get the information on the total size proposed for deletion, you can create a temporary text file with pasted list from the twiki and then do:
cat tmp.list | awk 'BEGIN{sum=0}{sum+=$4}END{print sum/1024.}'
This will give the total size in TB.
A email like this must be sent to the
cms-tier3-users@lists.psi.ch
mailing list:
Subject: Dataset deletion proposal and request for User Data cleaning - Due date: 28 Oct 2011, 9:0
Dear all,
a new cleaning campaign is needed, both at CSCS and PSI. You can find the list and the instructions on how to request to keep the data here:
https://twiki.cscs.ch/twiki/bin/view/CmsTier3/DataSetCleaningQuery
The data contained in the lists amount to 47TB / 44TB for CSCS / PSI.
If you need to store a dataset both at CSCS and at PSI please also reply to this e-mail explaining why.
Please remember to clean up your user folder at CSCS regularly; a usage overview can be found at [1].
Thanks, Daniel
[1] http://ganglia.lcg.cscs.ch/ganglia/cms_sespace.txt
Renew myproxy certificate for PhEDEx transfers (once each 11 months)
Nagios daily check the voms proxy lifetime used by PhEDEx; this proxy is a Fabio CMS proxy and because of that all the PhEDEx files uploaded in /pnfs/psi.ch/cms/trivcat/store/data
will belong to his account. If you change the proxy then you have to change the related files/dirs ownership in
/pnfs/psi.ch/cms/trivcat/store/data
or you will get a lot of
permission denied
.
Following how to store into
myproxy.cern.ch
a long term proxy:
[martinelli_f@t3ui06 nodes]$ myproxy-init -t 168 -R 't3cmsvobox.psi.ch' -l psi_phedex_fabio -x -k renewable -s myproxy.cern.ch -c 7000
Your identity: /DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=users/C=CH/O=Paul-Scherrer-Institut (PSI)/CN=Fabio Martinelli
Enter GRID pass phrase for this identity:
Creating proxy .............................................................................................................................. Done
Proxy Verify OK
Warning: your certificate and proxy will expire Wed Dec 10 01:00:00 2014
which is within the requested lifetime of the proxy
A proxy valid for 7000 hours (291.7 days) for user psi_phedex_fabio now exists on myproxy.cern.ch.
The present myproxy servers have problems with host certificates for PSI from SWITCH, because they contain a "(PSI)" substring, and the parentheses are not correctly escaped in the regexp matching of the myproxy code.
Therefore, the renewer DN (-R argument to myproxy-init below) and the
allowed renewers policy on the myproxy server need to be defined with wildcards to enable the matching to succeed.
voms-proxy-init -voms cms
myproxyserver=myproxy.cern.ch
servicecert="/DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=hosts/C=CH/ST=Aargau/L=Villigen/O=Paul-Scherrer-Institut (PSI)/OU=AIT/CN=t3cmsvobox.psi.ch"
servicecert='*/CN=t3cmsvobox.psi.ch'
myproxy-init -s $myproxyserver -l psi_phedex -x -R "$servicecert" -c 720
scp ~/.x509up_u$(id -u) phedex@t3ui01:gridcert/proxy.cert
# for testing, you can try
myproxy-info -s $myproxyserver -l psi_phedex
As the phedex user do
chmod 600 ~/gridcert/proxy.cert
You should test whether the renewal of the certificate works for the phedex user:
unset X509_USER_PROXY # make sure that the service credentials from ~/.globus are used!
voms-proxy-init # initializes the service proxy cert that is allowed to retrieve the user cert
myproxyserver=myproxy.cern.ch
myproxy-get-delegation -s $myproxyserver -v -l psi_phedex -a /home/phedex/gridcert/proxy.cert -o /tmp/gagatest
export X509_USER_PROXY=/tmp/gagatest
srm-get-metadata srm://t3se01.psi.ch:8443/srm/managerv1?SFN=/pnfs/psi.ch/cms
rm /tmp/gagatest
Storage Consistency Checks
From time to time the transfer team will ask for input for their
storage consistency check (so far only for T2); so far the last CSCS check was in
Feb 2014 ; to perform a 'Storage Consistency Check' we need to complete the following steps:
sed -e 's#/pnfs/lcg.cscs.ch/cms/trivcat/store/\(mc\|data\|generator\|results\|hidata\|himc\|lumi\|relval\)/#/store/\1/#' \
-e '/<entry name="\/pnfs\/lcg.cscs.ch\/cms\/.*<\/entry>/d' \
-e 's#<dCache:location>.*</dCache:location>##' \
outfile.xml | uniq > storagedump.xml
- compress, store on AFS, and send path to transfer team
- take the file you get back from the transfer team with the LFNs to be deleted
for LFN in $(cat SCC_Nov2012_CSCS_LFNsToBeRemoved.txt); do lcg-del -b -D srmv2 -l srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/$LFN; done
Emergency Measures
Installation
Please look the Puppet file
t3vobox
.
add the following package to run our custom "accounting"-scripts:
yum install perl-XML-Twig
Services
Refer to the description on the
Tier-2 VOBox.
There is one important difference: While we use FTS channels for the transfers to the Tier-2, we use the SRM backend for transfers to the Tier-3, because we do not have a FTS channel for PSI. This issue is linked to registering PSI as a regular grid site, which until recently was not possible, since we only sport a Grid SE, but no CE.
So, there is no fts.map file in the configuration area for the PhEDEx services.
Nagios checking each T3 pool
From this server we check each t3 pool, look
https://t3nagios.psi.ch/nagios/cgi-bin/status.cgi?servicegroup=SRM+T3+Tests&style=detail
Files involved in the tests are:
[martinelli_f@t3ui02 ~]$ find /pnfs/psi.ch/cms/t3-nagios/ | grep M | sort
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs01_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs02_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs03_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs04_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs04_cms_1
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs07_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs08_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs09_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs10_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs11_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_0
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_1
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_10
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_11
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_2
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_3
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_4
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_5
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_6
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_7
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_8
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_9
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_0
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_1
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_10
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_11
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_2
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_3
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_4
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_5
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_6
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_7
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_8
/pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_9
And the related dCache files must to be stored in the right pool,
otherwise the Nagios tests will be a fake! You can check on which pool the files are locate by ( output as a Python list ):
[martinelli_f@t3ui02 ~]$ find /pnfs/psi.ch/cms/t3-nagios/ | grep M | sort | dc_get_ID_from_pnfsnamelist.sh | dc_get_cacheinfo_from_IDlist.sh | xargs -iG echo [\'G\'],| sed s/' '/\',\'/| xargs -iG echo G \\
[00020000000000000330BF90,t3fs01_cms],
[00020000000000000330BF10,t3fs02_cms],
[000200000000000006391F88,t3fs03_cms],
[00020000000000000395B300,t3fs04_cms],
[0002000000000000063D8C68,t3fs04_cms_1],
[0000119DDCFD0C5F42B89769BC9C104A997F,t3fs07_cms],
[00003A989AB6D1074D738594B1D01E2D03DE,t3fs08_cms],
[0000479ED8FDDC374BC68827AEDF1C146686,t3fs09_cms],
[000005D1DD24CA14448694E5C46A8AA8E91F,t3fs10_cms],
[00009E4A9774085C4799B5C9C827DA03406F,t3fs11_cms],
[0002000000000000052EEFB8,t3fs13_cms],
[00005560491A76DE49DBA142D3BE3CFE38D5,t3fs13_cms_0],
[00000CB9E97140F940CD973C319045B43FDA,t3fs13_cms_1],
[0000073FF4F754BB4AB1B4599F412811BDA2,t3fs07_cms],
[00000DB49D5B69EB4C568834BD162C3DA8E7,t3fs13_cms_11],
[0002000000000000052EF018,t3fs13_cms_2],
[0002000000000000052EF048,t3fs13_cms_3],
[0002000000000000052EF078,t3fs13_cms_4],
[0002000000000000052EF0A8,t3fs13_cms_5],
[0002000000000000072F8630,t3fs13_cms_6],
[00003FC27733ACBA4A809677419256FE22F9,t3fs13_cms_7],
[0000132841305C27434891574015FD2CF923,t3fs13_cms_8],
[00004783F9158A5941B284342FF4A8EDE126,t3fs13_cms_9],
[0002000000000000052EF0D8,t3fs14_cms],
[0000A6470E0458354BD99D6C2DD27B196DCC,t3fs14_cms_0],
[0002000000000000052EF108,t3fs14_cms_1],
[0000B3D6A96EF961473AACB05F80CF9D6892,t3fs14_cms_10],
[0000683230E21D284E80BCBEC5C6C064A350,t3fs14_cms_11],
[0002000000000000052EF138,t3fs14_cms_2],
[0002000000000000052EF168,t3fs14_cms_3],
[0002000000000000052EF198,t3fs14_cms_4],
[000200000000000006300738,t3fs14_cms_5],
[0000E88C6CBB2D5A4365B11BE2EDD1554366,t3fs14_cms_6],
[000018AA61C1E30F43709F0D9FE3B9CD65D1,t3fs14_cms_7],
[00001A2FD52D31DB4CCAB99C8B8336522339,t3fs14_cms_8],
[00005E3B967B94714FB6BDEA1BB47381DCC3,t3fs14_cms_9]
Backups
OS snapshots are nightly taken by PSI VMWare Team ( like Peter Huesser ) + we have
LinuxBackupsByLegato to recover a single file.