<!-- keep this as a security measure: #uncomment if the subject should only be modifiable by the listed groups * Set ALLOWTOPICCHANGE = Main.TWikiAdminGroup,Main.CMSAdminGroup * Set ALLOWTOPICRENAME = Main.TWikiAdminGroup,Main.CMSAdminGroup #uncomment this if you want the page only be viewable by the listed groups # ######* Set ALLOWTOPICVIEW = Main.TWikiAdminGroup,Main.CMSAdminGroup --> ---+!! Node Type: %CALC{"$SUBSTITUTE(%TOPIC%,NodeType,)"}% ---++!! Firewall requirements | *local port* | *open to* | *reason* | <!-- Example line #| 22/tcp | * | Example entry for ssh | --> --- %TOC{title="Table of contents"}% ---+ Installation Installation is described by the Puppet files =tier3-baseclasses.pp= and =SL6_vobox.pp= both saved in the dir =puppetdirenodes=, where =puppetdirenodes= is an alias defined in the following list: <pre>alias kscustom57='cd /afs/psi.ch/software/linux/dist/scientific/57/custom' alias kscustom64='cd /afs/psi.ch/software/linux/dist/scientific/64/custom' alias ksdir='cd /afs/psi.ch/software/linux/kickstart/configs' alias puppetdir='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/' alias puppetdirnodes='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/nodes' alias puppetdirredhat='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/RedHat' alias puppetdirsolaris='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/Solaris/5.10' alias yumdir5='cd /afs/psi.ch/software/linux/dist/scientific/57/scripts' alias yumdir6='cd /afs/psi.ch/software/linux/dist/scientific/6/scripts' </pre> ---+ Regular Maintenance work *Note that there are many checks performed by [[https://t3nagios.psi.ch/nagios/cgi-bin/status.cgi?navbarsearch=1&host=t3cmsvobox][t3nagios]]* ---++ How to connect to the PhEDEx DBs If you want to directly inspect the PhEDEx DBs you might use =sqlplus= ; in another shell observe by =netstat -tp | grep sqlplus= your =sqlplus= connections and kill them by =killall sqlplus= if =sqlplus= will hang : <pre>[root@t3cmsvobox01 phedex]# su - phedex -bash-4.1$ source /home/phedex/PHEDEX/etc/profile.d/env.sh -bash-4.1$ which sqlplus ~/sw/slc6_amd64_gcc461/external/oracle/11.2.0.3.0__10.2.0.4.0/bin/sqlplus -bash-4.1$ sqlplus $(/home/phedex/PHEDEX/Utilities/OracleConnectId -db /home/phedex/config/DBParam.PSI:%BLUE%Prod%ENDCOLOR%/PSI) SQL*Plus: Release 11.2.0.3.0 Production on Wed May 27 14:16:11 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to:%BLUE% Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options%ENDCOLOR% SQL> select id,name from t_adm_node where name like '%CSCS%' or name like '%PSI%' ; ID NAME ---------- -------------------- 27 T2_CH_CSCS %ORANGE%821 T3_CH_PSI%ENDCOLOR% SQL> select distinct r.id, r.created_by, r.time_create,r.comments reqcomid, rds.dataset_id, rds.name, rd.decided_by, rd.time_decided, rd.comments accomid from t_req_request r join t_req_type rt on rt.id = r.type join t_req_node rn on rn.request = r.id left join t_req_decision rd on rd.request = r.id and rd.node = rn.node join t_req_dataset rds on rds.request = r.id where rn.node = %ORANGE%821%ENDCOLOR% and rt.name = 'xfer' and rd.decision = 'y' and dataset_id in (select distinct b.dataset from t_dps_block b join t_dps_block_replica br on b.id = br.block join t_dps_dataset d on d.id = b.dataset where node = %ORANGE%821%ENDCOLOR% ) order by r.time_create desc ; ID CREATED_BY TIME_CREATE REQCOMID DATASET_ID NAME DECIDED_BY TIME_DECIDED ACCOMID ---------- ---------- ----------- ---------- ---------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ------------ ---------- 441651 786542 1429196738 303750 674696 /RSGravToWW_kMpl01_M-800_TuneCUETP8M1_13TeV-pythia8/RunIIWinter15GS-MCRUN2_71_V1-v1/GEN-SIM 786664 1429287626 303779 441651 786542 1429196738 303750 674704 /RSGravToWW_kMpl01_M-1800_TuneCUETP8M1_13TeV-pythia8/RunIIWinter15GS-MCRUN2_71_V1-v1/GEN-SIM 786664 1429287626 303779 441651 786542 1429196738 303750 674709 /RSGravToWW_kMpl01_M-2500_TuneCUETP8M1_13TeV-pythia8/RunIIWinter15GS-MCRUN2_71_V1-v1/GEN-SIM ... </pre> ---++ Dataset cleaning This task must be done regularly (once every 2 months, for example), both for CSCS and PSI. *Getting the datasets list* Connect to t3cmsvobox as root and: <verbatim> su - phedex cd svn-sandbox/phedex/DB-query-tools/ source /home/phedex/PHEDEX/etc/profile.d/env.sh ./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep "eleted" ./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%CSCS%" | grep -vE "Dutta|Fanfani|Kress|Magini|Wuerthwein|Belforte|Spinoso|Ajit|DataOps|eleted|StoreResults|Argiro|Klute|vocms237|IntelROCCS" ./ListSiteDataInfo.pl -w -t --db ~/config/DBParam.PSI:Prod/PSI -s "%PSI%" </verbatim> The *first* PERL command creates a list of datasets that can be safely deleted from CSCS, as they are just support requests for transfers to PSI (check that the transfer happened safely). <br /> The *second* command creates a list avoiding to include central requests, and the ones that can be deleted from CSCS.<br /> The *third* command produces a list for PSI. Datasets which are proposed for deletion are all the datasets which have an *expired retention time*. *Publishing the list and notify users* Due date for feedback is usually in a week. Lists must be published in DataSetCleaningQuery (previous lists must be deleted). To get the information on the total size proposed for deletion, you can create a temporary text file with pasted list from the twiki and then do: <verbatim> cat tmp.list | awk 'BEGIN{sum=0}{sum+=$4}END{print sum/1024.}' </verbatim> This will give the total size in TB. A email like this must be sent to the =cms-tier3-users@lists.psi.ch= mailing list: <verbatim> Subject: Dataset deletion proposal and request for User Data cleaning - Due date: 28 Oct 2011, 9:0 Dear all, a new cleaning campaign is needed, both at CSCS and PSI. You can find the list and the instructions on how to request to keep the data here: https://twiki.cscs.ch/twiki/bin/view/CmsTier3/DataSetCleaningQuery The data contained in the lists amount to 47TB / 44TB for CSCS / PSI. If you need to store a dataset both at CSCS and at PSI please also reply to this email explaining why. Please remember to clean up your user folder at CSCS regularly; a usage overview can be found at [1] and [2] Thanks, Daniel [1] http://ganglia.lcg.cscs.ch/ganglia/cms_sespace.txt [2] http://ganglia.lcg.cscs.ch/ganglia/files_cms.html </verbatim> ---++ Renew myproxy certificate for !PhEDEx transfers (once each 11 months) *Nagios daily checks the [[https://t3nagios.psi.ch/nagios/cgi-bin/extinfo.cgi?type=2&host=t3cmsvobox&service=CMS+VOMS+proxy+age][voms proxy lifetime]] used by PhEDEx; this proxy is a Fabio CMS proxy and because of that all the PhEDEx files uploaded in =/pnfs/psi.ch/cms/= belong to his account. If you change that proxy then you have to change the related files/dirs ownership in =/pnfs/psi.ch/cms= ; specifically you'll want to change the owner of =/pnfs/psi.ch/cms/trivcat/store/data= , conversely you will get a lot of =permission denied=. Following how to upload a long-life proxy into =myproxy.cern.ch= : <pre>%BLUE%$%ENDCOLOR% myproxy-init -t 168 -R 't3cmsvobox.psi.ch' -l %GREEN%psi_phedex_fabio%ENDCOLOR% -x -k renewable -s myproxy.cern.ch -c %RED%8700%ENDCOLOR% Your identity: /DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=users/C=CH/O=Paul-Scherrer-Institut (PSI)/CN=Fabio Martinelli Enter GRID pass phrase for this identity: Creating proxy .......................................................................................................................................... Done Proxy Verify OK Warning: your certificate and proxy will expire Thu Dec 10 01:00:00 2015 which is within the requested lifetime of the proxy A proxy valid for %RED%8700%ENDCOLOR% hours (%RED%362.5 days%ENDCOLOR%) for user %GREEN%psi_phedex_fabio%ENDCOLOR% now exists on myproxy.cern.ch. # That %RED%362.5 days%ENDCOLOR% is wrong ! %BLUE%$%ENDCOLOR% myproxy-info -s myproxy.cern.ch -l %GREEN%psi_phedex_fabio%ENDCOLOR% username: %GREEN%psi_phedex_fabio%ENDCOLOR% owner: /DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=users/C=CH/O=Paul-Scherrer-Institut (PSI)/CN=Fabio Martinelli name: renewable renewal policy: */CN=t3cmsvobox.psi.ch timeleft: 6249:20:19 (%RED%260.4 days%ENDCOLOR%) </pre> The present myproxy servers have problems with host certificates for PSI from SWITCH, because they contain a "(PSI)" substring, and the parentheses are not correctly escaped in the regexp matching of the myproxy code. Therefore, the renewer DN (-R argument to myproxy-init below) and the _allowed renewers policy on the myproxy server_ need to be defined with wildcards to enable the matching to succeed. <pre> voms-proxy-init -voms cms myproxyserver=myproxy.cern.ch <span style="text-decoration: line-through;">servicecert="/DC=com/DC=quovadisglobal/DC=grid/DC=switch/DC=hosts/C=CH/ST=Aargau/L=Villigen/O=Paul-Scherrer-Institut (PSI)/OU=AIT/CN=t3cmsvobox.psi.ch"</span> servicecert='*/CN=t3cmsvobox.psi.ch' myproxy-init -s $myproxyserver -l psi_phedex -x -R "$servicecert" -c 720 scp ~/.x509up_u$(id -u) phedex@t3ui01:gridcert/proxy.cert # for testing, you can try myproxy-info -s $myproxyserver -l psi_phedex </pre> As the phedex user do <pre>chmod 600 ~/gridcert/proxy.cert </pre> You should test whether the renewal of the certificate works for the phedex user: unset X509_USER_PROXY # make sure that the service credentials from ~/.globus are used! <pre>voms-proxy-init # initializes the service proxy cert that is allowed to retrieve the user cert myproxyserver=myproxy.cern.ch myproxy-get-delegation -s $myproxyserver -v -l psi_phedex -a /home/phedex/gridcert/proxy.cert -o /tmp/gagatest export X509_USER_PROXY=/tmp/gagatest srm-get-metadata srm://t3se01.psi.ch:8443/srm/managerv1?SFN=/pnfs/psi.ch/cms rm /tmp/gagatest </pre> ---++ Storage Consistency Checks From time to time the transfer team will ask for input for their [[https://twiki.cern.ch/twiki/bin/view/CMSPublic/CompOpsTransferTeamConsistencyChecks][storage consistency check]] (so far only for T2); the last CSCS check was in [[https://ggus.eu/index.php?mode=ticket_info&ticket_id=101366][Feb 2014]] ; to perform a 'Storage Consistency Check' we need to complete the following steps: * make sure PhEDEx is updated to the latest version and its config is committed in [[https://twiki.cern.ch/twiki/bin/view/CMSPublic/CompOpsSiteconfMigrationToGit][GIT]] * ask CSCS admins for a storage dump <pre>python chimera-dump.py -s /pnfs/lcg.cscs.ch/cms -c fulldump -g -o /tmp/outfile</pre> * convert the file using: <pre>sed -e 's#/pnfs/lcg.cscs.ch/cms/trivcat/store/\(mc\|data\|generator\|results\|hidata\|himc\|lumi\|relval\)/#/store/\1/#' \ -e '/<entry name="\/pnfs\/lcg.cscs.ch\/cms\/.*<\/entry>/d' \ -e 's#<dCache:location>.*</dCache:location>##' \ outfile.xml | uniq > storagedump.xml </pre> * compress, store on AFS, and send path to transfer team * take the file you get back from the transfer team with the LFNs to be deleted <pre>for LFN in $(cat SCC_Nov2012_CSCS_LFNsToBeRemoved.txt); do lcg-del -b -D srmv2 -l srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/$LFN; done </pre> ---+ Emergency Measures <!-- #List any measures that must be taken in case of some major incident, e.g. whether a mailing #list must be contacted or whether other services need to be shut down, etc. --> ---+ Installation Look the Puppet file =/afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/nodes/SL6_vobox.pp= <!-- #Comment here on any peculiarities of the installation, e.g. on special packages needed, special setup #procedures which are not obvious --> add the following package to run our custom "accounting"-scripts: <pre>yum install perl-XML-Twig </pre> ---+ Services <!-- #List all the important services, their installation, configuration and how to start and stop them --> ---++ PhEDEx Refer to the description on the [[LCGTier2/CmsVObox][Tier-2 VOBox]]. There is one important difference: *while we use FTS channels for the transfers to the Tier-2 we use the SRM backend for transfers to the Tier-3*, because we do not have a FTS channel for PSI. This issue is linked to registering PSI as a regular grid site, which until recently was not possible, since we only support a Grid SE, but no a CE. Thus there is no =fts.map= file in the configuration area for the !PhEDEx services. ---++ Nagios checking each T3 pool by both t3se01 SRM and t3dcachedb Xrootd protocols By =t3cmsvobox= we check each T3 pool twice, by t3se01 SRM and by t3dcachedb Xrootd ; look https://t3nagios.psi.ch/nagios/cgi-bin/status.cgi?navbarsearch=1&host=t3cmsvobox In both the cases the file names retrieved by the tests are : <pre>[martinelli_f@t3ui02 ~]$ find /pnfs/psi.ch/cms/t3-nagios/ | grep M | sort /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs01_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs02_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs03_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs04_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs04_cms_1 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs07_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs08_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs09_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs10_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs11_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_0 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_1 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_10 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_11 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_2 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_3 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_4 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_5 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_6 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_7 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_8 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs13_cms_9 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_0 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_1 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_10 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_11 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_2 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_3 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_4 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_5 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_6 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_7 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_8 /pnfs/psi.ch/cms/t3-nagios/1MB-test-file_pool_t3fs14_cms_9 </pre> The related dCache files have to be placed in the right pool *or the Nagios tests will be obviously wrong*! You can easily check on which pool are stored the dCache files by : <pre>$ %BLUE%find /pnfs/psi.ch/cms/t3-nagios/ | grep M | sort | dc_get_ID_from_pnfsnamelist.sh | dc_get_cacheinfo_from_IDlist.sh | xargs -iG echo [\'G\'],| sed s/' '/\',\'/| xargs -iG echo G \\%ENDCOLOR% [00020000000000000330BF90,%BLUE%t3fs01_cms%ENDCOLOR%], [00020000000000000330BF10,%BLUE%t3fs02_cms%ENDCOLOR%], [000200000000000006391F88,%BLUE%t3fs03_cms%ENDCOLOR%], [00020000000000000395B300,%BLUE%t3fs04_cms%ENDCOLOR%], [0002000000000000063D8C68,%BLUE%t3fs04_cms_1%ENDCOLOR%], [0000119DDCFD0C5F42B89769BC9C104A997F,%BLUE%t3fs07_cms%ENDCOLOR%], [00003A989AB6D1074D738594B1D01E2D03DE,%BLUE%t3fs08_cms%ENDCOLOR%], [0000479ED8FDDC374BC68827AEDF1C146686,%BLUE%t3fs09_cms%ENDCOLOR%], [000005D1DD24CA14448694E5C46A8AA8E91F,%BLUE%t3fs10_cms%ENDCOLOR%], [00009E4A9774085C4799B5C9C827DA03406F,%BLUE%t3fs11_cms%ENDCOLOR%], [0002000000000000052EEFB8,%BLUE%t3fs13_cms%ENDCOLOR%], [00005560491A76DE49DBA142D3BE3CFE38D5,%BLUE%t3fs13_cms_0%ENDCOLOR%], [00000CB9E97140F940CD973C319045B43FDA,%BLUE%t3fs13_cms_1%ENDCOLOR%], [0000073FF4F754BB4AB1B4599F412811BDA2,%BLUE%t3fs13_cms_10%ENDCOLOR%], [00000DB49D5B69EB4C568834BD162C3DA8E7,%BLUE%t3fs13_cms_11%ENDCOLOR%], [0002000000000000052EF018,%BLUE%t3fs13_cms_2%ENDCOLOR%], [0002000000000000052EF048,%BLUE%t3fs13_cms_3%ENDCOLOR%], [0002000000000000052EF078,%BLUE%t3fs13_cms_4%ENDCOLOR%], [0002000000000000052EF0A8,%BLUE%t3fs13_cms_5%ENDCOLOR%], [0002000000000000072F8630,%BLUE%t3fs13_cms_6%ENDCOLOR%], [00003FC27733ACBA4A809677419256FE22F9,%BLUE%t3fs13_cms_7%ENDCOLOR%], [0000132841305C27434891574015FD2CF923,%BLUE%t3fs13_cms_8%ENDCOLOR%], [00004783F9158A5941B284342FF4A8EDE126,%BLUE%t3fs13_cms_9%ENDCOLOR%], [0002000000000000052EF0D8,%BLUE%t3fs14_cms%ENDCOLOR%], [0000A6470E0458354BD99D6C2DD27B196DCC,%BLUE%t3fs14_cms_0%ENDCOLOR%], [0002000000000000052EF108,%BLUE%t3fs14_cms_1%ENDCOLOR%], [0000B3D6A96EF961473AACB05F80CF9D6892,%BLUE%t3fs14_cms_10%ENDCOLOR%], [00003616229002194F439925DA3C7F1CFA02,%BLUE%t3fs14_cms_11%ENDCOLOR%], [0002000000000000052EF138,%BLUE%t3fs14_cms_2%ENDCOLOR%], [0002000000000000052EF168,%BLUE%t3fs14_cms_3%ENDCOLOR%], [0002000000000000052EF198,%BLUE%t3fs14_cms_4%ENDCOLOR%], [000200000000000006300738,%BLUE%t3fs14_cms_5%ENDCOLOR%], [0000E88C6CBB2D5A4365B11BE2EDD1554366,%BLUE%t3fs14_cms_6%ENDCOLOR%], [000018AA61C1E30F43709F0D9FE3B9CD65D1,%BLUE%t3fs14_cms_7%ENDCOLOR%], [00001A2FD52D31DB4CCAB99C8B8336522339,%BLUE%t3fs14_cms_8%ENDCOLOR%], [0000B58A7FA17778439F8F6F47C5CBBED5E7,%BLUE%t3fs14_cms_9%ENDCOLOR%], </pre> ---+ Backups OS snapshots are nightly taken by the PSI VMWare Team ( contact Peter Huesser ) + we have LinuxBackupsByLegato to recover a single file.
NodeTypeForm
Hostnames
t3cmsvobox ( t3cmsvobox01 )
Services
PhEDEx
4.1.3
Hardware
PSI DMZ VMWare cluster
Install Profile
vobox
Guarantee/maintenance until
VMWare PSI Cluster
This topic: CmsTier3
>
WebHome
>
AdminArea
>
CmsVoBox
Topic revision: r31 - 2015-05-27 - DanielMeister
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback