Migration from dCache 1.9.12 to 2.2.10 (STEP BY STEP)
Create testbed
- Copy the following files and md5sum them:
- Preproduction:
cd /home/miguelgi/dcache-22/pp
lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp1 file:/home/miguelgi/dcache-22/pp/randompp1
lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp2 file:/home/miguelgi/dcache-22/pp/randompp2
lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp3 file:/home/miguelgi/dcache-22/pp/randompp3
md5sum * > md5.txt
- Production:
cd /home/miguelgi/dcache-22/prod
lcg-cp -v -n 8 srm://storage01.lcg.cscs.ch:8443/pnfs/lcg.cscs.ch/atlas/pi/pidec_0111 file:/home/miguelgi/dcache-22/prod/pidec_0111
lcg-cp -v -n 8 srm://storage01.lcg.cscs.ch:8443/pnfs/lcg.cscs.ch/atlas/pi/pidec_1202 file:/home/miguelgi/dcache-22/prod/pidec_1202
lcg-cp -v -n 8 srm://storage01.lcg.cscs.ch:8443/pnfs/lcg.cscs.ch/atlas/pi/pidec_1111 file:/home/miguelgi/dcache-22/prod/pidec_1111
lcg-cp -v -n 2 lfn:/grid/dteam/cscs/testbed-miguelgi-randompp1 file:$PWD/randompp1
lcg-cp -v -n 2 lfn:/grid/dteam/cscs/testbed-miguelgi-randompp2 file:$PWD/randompp2
lcg-cp -v -n 2 lfn:/grid/dteam/cscs/testbed-miguelgi-randompp3 file:$PWD/randompp3
md5sum * > md5.txt
Head-nodes
Items to backup
Make sure that you keep a copy of these items:
- PostgreSQL databases in both head nodes.
- dCache Billing in both head nodes (
/opt/d-cache/billing/
)
- dCache keys and certificates (
/opt/d-cache/etc/{host_key, host_key.pub, server_key, server_key.pub}
). This should be in CFengine.
-
PoolManager.conf
saved by the system May 06 13:34 [root@storage02:~]# ls /root/PoolManager.conf.saved -lh
-rw-r--r-- 1 root root 16K May 6 13:33 /root/PoolManager.conf.saved
Shutdown dCache 1.9.12
- Shutdown dCache 1.9.12 on all pools.
pdsh -w ppse0[1-3] "service dcache stop" |dshbak -c
- Shutdown dCache 1.9.12 on the head nodes.
pdsh -w ppstorage0[1-2] "service dcache stop" |dshbak -c
- Make a backup of both PostgreSQL databases.
[root @ ppstorage01]-[11:54:11]-[~]:-(# mount puppet:/cm /media
[root @ ppstorage01]-[11:55:27]-[~]:-)# /usr/bin/pg_dumpall -U postgres -c > /media/dcache-22/pp/`hostname -s`.dumpall-c_complete.sql
[root @ ppstorage02]-[11:56:41]-[~]:-(# mount puppet:/cm /media
[root @ ppstorage02]-[11:56:44]-[~]:-)# /usr/bin/pg_dumpall -U postgres -c > /media/dcache-22/pp/`hostname -s`.dumpall-c_complete.sql
- Make a backup of the BILLING.
[root @ ppstorage01]-[12:02:41]-[~]:-)# tar czf /media/dcache-22/pp/`hostname -s`.billing.tar.gz /opt/d-cache/billing/
[root @ ppstorage02]-[12:02:38]-[~]:-)# tar czf /media/dcache-22/pp/`hostname -s`.billing.tar.gz /opt/d-cache/billing/
- Shutdown the head nodes.
pdsh -w ppstorage0[1-2] "shutdown -h now; exit" |dshbak -c
- Boot with BRE.3 image and dump
/dev/sda
to a different system (mounted by NFS). In KvmVirtualization there is more information on how to mount a dd volume with LVM inside. [root @ ppnfs]-[09:19:41]-[/kvm_guests2]:-)# mount puppet:/cm /media
[root @ ppnfs]-[09:19:52]-[/kvm_guests2]:-)# cp ppstorage0*.root /media/dcache-22/pp/ -v
`ppstorage01.root' -> `/media/dcache-22/pp/ppstorage01.root'
`ppstorage02.root' -> `/media/dcache-22/pp/ppstorage02.root'
[root @ ppnfs]-[09:55:13]-[~]:-)# umount /media
- Once we have a proper backup of the complete system and the databases, we can install the new hardware.
Installation of the new systems
- Install
storage01
and storage02
with SL 6.4 # do not forge to add them to the category on cfagent.conf
# SL64 = ( atlas01 PPPDCACHE3)
# UMD2 = ( ...BLABLABLA... PPDCACHE3 )
- Update all the packages in both systems and reboot.
pdsh -w ppstorage0[1-2] "yum update -y --disableexcludes=main" |dshbak -c
- Put in place iptables rules to block transfers to the system!
iptables -D INPUT -p tcp -m state --state NEW -m tcp --dport 8443:8446 -j ACCEPT
iptables -D INPUT -p tcp -m state --state NEW -m tcp --dport 2288 -j ACCEPT
iptables -D INPUT -p tcp -m state --state NEW -m tcp --dport 2811 -j ACCEPT
iptables -D INPUT -p tcp -m state --state NEW -m tcp --dport 20000:25000 -j ACCEPT
iptables -D INPUT -p udp -m state --state NEW -m udp --dport 20000:25000 -j ACCEPT
iptables -D INPUT -p tcp -m state --state NEW -m tcp --dport 1094 -j ACCEPT
iptables -D INPUT -p tcp -m state --state NEW -m tcp --dport 2170 -j ACCEPT
Reinstallation of dCache 1.9.12 on the new hardware
- Install dCache 1.9.12 on the new system following instructions. Do NOT start dCache on the pools.
-
storage01
and storage02
yum install postgresql-server ca-policy-egi-core
yum localinstall ./dcache-server-1.9.12-26.noarch.rpm --disableexcludes=main
service postgresql initdb
sed -i 's/ident/trust/' /var/lib/pgsql/data/pg_hba.conf
service postgresql start
- Install Java 7 as the following error shows:
Unsupported major.minor version 51.0
yum remove jdk
yum install java-1.7.0-openjdk
- Dump back the databases to the new systems
-
storage01
and storage02
mount puppet:/cm /media
su - postgres
psql -f /media/dcache-22/pp/`hostname -s`.dumpall-c_complete.sql
exit
-
storage02
only echo "/ localhost(rw)
/pnfs *.lcg.cscs.ch(rw)" > /etc/exports
echo 'RPCBIND_ARGS="-i"' > /etc/sysconfig/rpcbind
service rpcbind restart
-
storage01
only yum install emi-resource-information-service
ln -s /opt/d-cache/libexec/infoProvider/info-based-infoProvider.sh /var/lib/bdii/gip/provider/info-based-infoProvider.sh
- Dump back the Billing on
storage01
and storage02
cd /
tar xzf /media/dcache-22/pp/`hostname -s`.billing.tar.gz
- Copy vomsdir from one of the CREAM-CEs. Note alternatively use YAIM. "yum install dpm-yaim && /opt/glite/yaim/bin/yaim -r -s /opt/cscs/siteinfo/site-info.def -n emi_dpm_disk -f config_vomsdir"
scp -r cream02:/etc/grid-security/vomsdir/ /etc/grid-security/
ORyum install dpm-yaim && /opt/glite/yaim/bin/yaim -r -s /opt/cscs/siteinfo/site-info.def -n emi_dpm_disk -f config_vomsdir"
- Reboot the systems.
reboot;exit
- On
storage02
, open traffic from the offices network iptables -A INPUT -s 148.187.133.200/255.255.255.0 -p tcp -m tcp --dport 22223 -m state --state NEW -j ACCEPT
- Start dCache on the head nodes. Check the logs for errors.
pdsh -w ppstorage0[1-2] "service dcache start" |dshbak -c
pdsh -w ppstorag01 "service bdii start" |dshbak -c
- Start dCache on the pools. Check the logs for errors. pdsh -w ppse0[1-3] "service dcache start" |dshbak -c</verbatim>
Test the functionality of dCache on the new hardware.
- Test the BDII updating process:
ui64 $ ldapsearch -x -LLL -h ppstorage01.lcg.cscs.ch -p 2170 -b "o=grid" >/dev/null
ui64 $ ldapsearch -x -LLL -h ppstorage01.lcg.cscs.ch -p 2170 -b "o=grid" |wc -l
595
- Test the SE with dcache tools:
ui64 $ chk_SE-dcache -n ppstorage01.lcg.cscs.ch
- Test the SE with lcg tools:
ui64 $ chk_SE-lcgtools -d ppstorage01.lcg.cscs.ch -
ui64 $ export LCG_GFAL_INFOSYS=ppbdii01.lcg.cscs.ch:2170 # need to do this because bdii takes a while to be refreshed across all levels
ui64 $ lcg-ls -d -l srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp1
- Test whether our files are correctly stored:
ui64 $ lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp1 file:/home/miguelgi/dcache-22/pp/randompp1
ui64 $ lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp2 file:/home/miguelgi/dcache-22/pp/randompp2
ui64 $ lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp3 file:/home/miguelgi/dcache-22/pp/randompp3
ui64 $ md5sum -c md5.txt
Pre-installation
- Login to the admin shell and execute:
(local) admin > cd PoolManager
(PoolManager) admin > save
Look at: May 06 13:34 [root@storage02:~]# ls /root/PoolManager.conf.saved -lh
-rw-r--r-- 1 root root 16K May 6 13:33 /root/PoolManager.conf.saved
Install new RPM
- Remove old version on the head nodes.
pdsh -w ppse0[1-3] "service dcache stop" |dshbak -c
pdsh -w ppstorage0[1-2] "service dcache stop" |dshbak -c
pdsh -w ppstorage0[1-2] "yum remove -y dcache-server" |dshbak -c
- Move headnodes to new CFengine category:
# dCache 1.9.12
PPPOOLS3 = ( ppse01 ppse02 ppse03 ppse04 ppse05 )
PPDCACHE3 = ( ppstorage01 ppstorage02 PPPOOLS3 )
# dCache 2.2
# PPPOOLS22 = ( ppse01 ppse02 ppse03 ppse04 ppse05 )
PPDCACHE22 = ( ppstorage01 ppstorage02 PPPOOLS22 )
- Run CFengine on the head nodes
pdsh -w ppstorage0[1-2] "cfagent -q" |dshbak -c
- Make sure
/etc/dcache
is properly populated [root @ ppstorage02]-[~]:-)# ls /etc/dcache -l
total 48
drwxr-xr-x 3 root root 4096 May 3 17:02 admin
-rw-r--r-- 1 root root 1934 May 3 16:06 dcache.conf
-rw-r--r-- 1 root root 1258 May 3 16:04 dcache.kpwd
-r--r--r-- 1 root root 3951 May 3 16:04 dcachesrm-gplazma.policy
-rw-r--r-- 1 root root 183 May 3 16:23 gplazma.conf
drwxr-xr-x 2 root root 4096 May 3 16:04 layouts
-rw-r--r-- 1 root root 1542 May 3 16:04 LinkGroupAuthorization.conf
-rw-r--r-- 1 root root 0 May 3 16:19 lm.config
-rw-r--r-- 1 root root 7756 May 3 17:09 logback.xml
-rw-r--r-- 1 root root 10331 May 3 16:27 tc-config.xml
[root @ ppstorage01]-[~]:-)# ls -lh /etc/dcache
total 300K
drwxr-xr-x 2 root root 4.0K May 3 16:05 admin
-rw-r--r-- 1 root root 234K May 3 16:37 certificates.jks <--- this gets generated at a later step
-rw-r--r-- 1 root root 1.9K May 3 16:06 dcache.conf
-rw-r--r-- 1 root root 1.3K May 3 16:05 dcache.kpwd
-r--r--r-- 1 root root 3.9K May 3 16:05 dcachesrm-gplazma.policy
-rw-r--r-- 1 root root 183 May 3 16:33 gplazma.conf
-r-------- 1 root root 2.9K May 3 16:33 hostcert.p12
-rw-r--r-- 1 root root 11K May 3 16:05 info-provider.xml
drwxr-xr-x 2 root root 4.0K May 3 16:05 layouts
-rw-r--r-- 1 root root 1.6K May 3 16:05 LinkGroupAuthorization.conf
-rw-r--r-- 1 root root 7.6K May 3 17:12 logback.xml
-rw-r--r-- 1 root root 11K May 3 16:27 tc-config.xml
- Install new RPM
pdsh -w ppstorage0[1-2] "wget http://www.dcache.org/downloads/1.9/repo/2.2/dcache-2.2.11-1.noarch.rpm -O dcache-2.2.11-1.noarch.rpm" |dshbak -c
pdsh -w ppstorage0[1-2] "yum localinstall ./dcache-2.2.11-1.noarch.rpm -y" |dshbak -c
Configure new dCache on the head nodes
- Remove old files on
storage01
and storage02
rm -rf /etc/init.d/dcache
mv /opt/d-cache /opt/d-cache.bck
mv /var/log/d-cache /var/log/old.d-cache
- Make sure new BDII links are created on
storage01
rm -f /var/lib/bdii/gip/provider/info-based-infoProvider.sh
ln -s /usr/sbin/dcache-info-provider /var/lib/bdii/gip/provider/info-based-infoProvider.sh
- Copy the old billing to the new location on
storage01
and storage02
cp -ar /opt/d-cache/billing/* /var/lib/dcache/billing
- Update the Chimera stored procedures:
psql -U postgres -f /usr/share/dcache/chimera/sql/pgsql-procedures.sql chimera
Start dCache
- On the head nodes:
dcache services
dcache status
dcache check-config
dcache start
- On
storage01
generate the Java keystore: dcache import cacerts --out=/etc/dcache/certificates.jks
- On the pool nodes, disable any kind of door by modifying the layout file and start dCache 1.9.12
service dcache start
Test the functionality of dCache on the new hardware.
- Test the BDII updating process:
ui64 $ ldapsearch -x -LLL -h ppstorage01.lcg.cscs.ch -p 2170 -b "o=grid" >/dev/null
ui64 $ ldapsearch -x -LLL -h ppstorage01.lcg.cscs.ch -p 2170 -b "o=grid" |wc -l
595
- Test the SE with dcache tools:
ui64 $ chk_SE-dcache -n ppstorage01.lcg.cscs.ch
- Test the SE with lcg tools:
ui64 $ chk_SE-lcgtools -d ppstorage01.lcg.cscs.ch -
ui64 $ export LCG_GFAL_INFOSYS=ppbdii01.lcg.cscs.ch:2170 # need to do this because bdii takes a while to be refreshed across all levels
ui64 $ lcg-ls -d -l srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp1
- Test whether our files are correctly stored:
ui64 $ lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp1 file:/home/miguelgi/dcache-22/pp/randompp1
ui64 $ lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp2 file:/home/miguelgi/dcache-22/pp/randompp2
ui64 $ lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp3 file:/home/miguelgi/dcache-22/pp/randompp3
ui64 $ md5sum -c md5.txt
Pools migration
In this case, we don't need to migrate to new hardware or SL version, so this is just a dCache upgrade.
- Shutdown dCache on the pools:
pdsh -w ppse0[1-3] "service dcache stop" |dshbak -c
- Move pools to the correct CFengine category
# dCache 1.9.12
# PPPOOLS3 = ( ppse01 ppse02 ppse03 ppse04 ppse05 )
# PPDCACHE3 = ( PPPOOLS3 )
# dCache 2.2
PPPOOLS22 = ( ppse01 ppse02 ppse03 ppse04 ppse05 )
PPDCACHE22 = ( ppstorage01 ppstorage02 PPPOOLS22 )
- Run CFengine
pdsh -w ppse0[1-3] "cfagent -q" |dshbak -c
- Move the contents of
/opt/d-cache
to a backup: pdsh -w ppse0[1-3] "mv /opt/d-cache /opt/d-cache.bck" |dshbak -c
- Install Java 7 as the following error shows:
Unsupported major.minor version 51.0
pdsh -w ppse0[1-3] "yum remove -y jdk" |dshbak -c
pdsh -w ppse0[1-3] "yum install -y java-1.7.0-openjdk" |dshbak -c
- Uncomment the part of the layout file relative to the doors configuration.
- Remove the old dCache RPM and install the new one:
pdsh -w ppse0[1-3] "yum remove -y dcache-server" |dshbak -c
pdsh -w ppse0[1-3] "wget http://www.dcache.org/downloads/1.9/repo/2.2/dcache-2.2.11-1.noarch.rpm -O dcache-2.2.11-1.noarch.rpm" |dshbak -c
pdsh -w ppse0[1-3] "yum localinstall ./dcache-2.2.11-1.noarch.rpm -y" |dshbak -c
- Make sure
/etc/dcache
is properly populated and check-config ran
pdsh -w ppse0[1-3] "dcache check-config" |dshbak -c
- Make sure
/etc/init.d/dcache
is deleted pdsh -w ppse0[1-3] "rm -fv /etc/init.d/dcache" |dshbak -c
- Start dCache 2.2
pdsh -w ppse0[1-3] "dcache start" |dshbak -c
Test the functionality of dCache on the new hardware.
- Test the BDII updating process:
ui64 $ ldapsearch -x -LLL -h ppstorage01.lcg.cscs.ch -p 2170 -b "o=grid" >/dev/null
ui64 $ ldapsearch -x -LLL -h ppstorage01.lcg.cscs.ch -p 2170 -b "o=grid" |wc -l
595
- Test the SE with dcache tools:
ui64 $ chk_SE-dcache -n ppstorage01.lcg.cscs.ch
- Test the SE with lcg tools:
ui64 $ chk_SE-lcgtools -d ppstorage01.lcg.cscs.ch -
ui64 $ export LCG_GFAL_INFOSYS=ppbdii01.lcg.cscs.ch:2170 # need to do this because bdii takes a while to be refreshed across all levels
ui64 $ lcg-ls -d -l srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp1
- Test whether our files are correctly stored:
ui64 $ lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp1 file:/home/miguelgi/dcache-22/pp/randompp1
ui64 $ lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp2 file:/home/miguelgi/dcache-22/pp/randompp2
ui64 $ lcg-cp -v -n 8 srm://ppstorage01.lcg.cscs.ch/pnfs/lcg.cscs.ch/dteam/randompp3 file:/home/miguelgi/dcache-22/pp/randompp3
ui64 $ md5sum -c md5.txt
Final steps
- Reload iptables service to open again the ports to the outside:
service iptables restart
Other notes
Install new RPM
Install the new version note, the 2.2.x rpm checks /opt/d-cache for old files so move the directory to keep a backup, we'll also need to move some files later.
mv /opt/d-cache /opt/d-cache.bkup
rpm -ivh dcache-2.2.10-1.noarch.rpm
Config files
Full details around the /opt to /etc location can be found at the following link. We'll need to manualy move the billing information
cp -r /opt/d-cache/billing.bkup/* /var/lib/dcache/billing
I would recomend reading up on what has moved so you have a good idea of the locations of files before proceeding
http://trac.dcache.org/wiki/optToUsr
Note this step is a workaround, dcache 2.2 required the vomsdir to be populated . We need to update yaim config for dcache as there is a "config_vomsdir" function we should be able to use.
scp -r cream01:/etc/grid-security/vomsdir /etc/gird-security/ Ensure that fetch-crl is going to be run
chkconfig --list | grep fetch-crl
/etc/init.d/fetch-crl-cron status
Starting dCache
Everything else should be in cfengine under /srv/cfengine/file/ppstorage0* so lets ensure the new dcache has the correct sevices, if it doesn't run cfagent.
dcache services
Check for any obvious errors (although there shouldn't be any)
dcache check-config If all is well start dcahce
dcache start Provided there are no problems run checks from ui64 to ensure all is working
chk_SE-dcache -n ppstorage01.lcg.cscs.ch
Other notes
You may see errors in like the below
"ERROR: function path2inodes(character varying, character varying) does not exist; Hint: No function matches the given name and argument types. You might need to add explicit type casts.; Position: 77]"
Import the missing procedure, note with dcache 2.2 defaults and example configs can be found under /usr/share/dcache
psql -U chimera -f /usr/share/dcache/chimera/sql/pgsql-procedures.sql chimera
Migrating to SL6
First we need to stop dcache and make a backup of the database
dcache stop
pg_dumpall -f /tmp/db-bkup.220413
If requried copy the billing information, note this is under /var in dcache 2.2
scp -r oldmachine:/var/lib/dcache/billing /var/lib/dcache
Copy the database to the new machine running SL6 and import it
su - postgres
psql -f /tmp/db-bkup.220413
Ensure that the settings within the pg_hba.conf are correct, in our case ensure authentication methods are set to trust.
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
Other notes
I expiricenced some authentication issues after this so I had to manually tell psql to reload the config (restarting the services/ server didn't seem to work for some reason)
su - postgres
pg_ctl reload
I had to install Java 7 as I recived the following error, note SL5 seemed fine with Java 6.
Unsupported major.minor version 51.0
When running the dcache checks from ui64 I noticed the following error when doign an srm copy
06 Mar 2013 17:36:46 (PinManager) [] Unexpected failure while expiring pins
java.lang.IllegalStateException: No JDO PersistenceManager bound to thread, and configuration does not allow creation of non-transactional one here
In oder to resolve this I had to restart the utility domian, this appears to have been a reported bug
https://lists.desy.de/sympa/arc/user-forum/2013-03/msg00019.html
dcache restart utilityDomian
--
GeorgeBrown - 2013-04-19