Tags:
create new tag
view all tags

CMS Tier-3 Upgrade Planning Page

dcache upgrade from 1.9.2-5 to 1.9.5-11 (failed), PSI power shutdown

Summary

Upgrade to the "golden" dcache version. Checks for improving authz configuration. The upgrade failed and had to be rolled back. On Jan 09 the computing center will be powered down. All our servers need to be off.

Details

Preparation:

  • download dcache software to t3se01:/root/download
  • Make a backup of the postgresql data base
  • Make copies of the /opt/d-cache directories on all nodes, so that they can be rolled back
  • Prepare the new config files by using the templates of the new distribution
    • /opt/d-cache/config/dCacheSetup (template is at /opt/d-cache/etc/dCacheSetup.template)
    • /opt/d-cache/etc/node_config
    • /opt/d-cache/etc/node_config/LinkGroupAuthorization.conf
    • /opt/d-cache/etc/dcachesrm-gplazma.policy
    • /etc/grid-security/grid-vorolemap
    • /etc/grid-security/storage-authzdb

Upgrade plan:

  1. stop phedex
  2. UI: Prevent user login and reboot to get rid of all logged in users
  3. We may want to kill all running jobs on the nodes (but we also can just let them run and fail)
  4. Stop dcache
  5. Make a backup of the current installation: one for t3se01 and one for a Thumper (t3fs05)
  6. Update t3se01 and t3dcachedb01 - DON NOT FORGET TO RUN install.sh!!!
  7. Sync distribution to the thumpers (we may also try to use the Solaris packages, instead)
    • check configuration files thoroughly
    • DON NOT FORGET TO RUN install.sh!!!
  8. Start dcache on t3se01 and dcachedb01
    • Check whether the cells come up correctly
  9. Start dcache on a single pool
    • Check services using our testing script
  10. Start remaining pools
  11. Investigate whether the Info system is still running ok
  12. *Only now we can start to toy with configuration changes for authz related things

t3se01

rpm -Uvh dcache-server-1.9.5-11.noarch.rpm
Preparing...                ########################################### [100%]
[main] ERROR startup.Catalina  - Catalina.stop:
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:525)
        at java.net.Socket.connect(Socket.java:475)
        at java.net.Socket.(Socket.java:372)
        at java.net.Socket.(Socket.java:186)
        at org.apache.catalina.startup.Catalina.stopServer(Catalina.java:394)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.catalina.startup.Bootstrap.stopServer(Bootstrap.java:343)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:434)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.srm-t3se01)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.gsidcap-t3se01)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.gridftp-t3se01)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.dcap-t3se01)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.gPlazma-t3se01)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.info)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.utility)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.httpd)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.adminDoor)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.dir)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.dCache)
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.lm)
   1:dcache-server          warning: /opt/d-cache/etc/srm_setup.env created as /opt/d-cache/etc/srm_setup.env.rpmnew
########################################### [100%]

Install the updated config files

  • /opt/d-cache/config/dCacheSetup
  • /opt/d-cache/etc/node_config

Run the installer script

[root@t3se01 download]# /opt/d-cache/install/install.sh
INFO:This node will need to mount the name server.
INFO:Skipping ssh key generation

 Checking MasterSetup  ./config/dCacheSetup O.k.

   Sanning dCache batch files

    Processing adminDoor
    Processing authdoor
    Processing chimera
    Processing dCache
    Processing dir
    Processing door
    Processing gPlazma
    Processing gridftpdoor
    Processing gsidcapdoor
    Processing httpd
    Processing httpdoor
    Processing info
    Processing infoProvider
    Processing lm
    Processing nfsv41
    Processing pnfs
    Processing pool
    Processing replica
    Processing srm
    Processing statistics
    Processing utility
    Processing xrootdDoor


 Checking Users database .... Ok
 Checking Security       .... Ok
 Checking JVM ........ Ok
 Checking Cells ...... Ok
 dCacheVersion ....... Version production-1.9.5-11

INFO:Checking if /pnfs/psi.ch mounted to the right export. ...
INFO:Will be mounted to t3dcachedb01.psi.ch:/pnfsdoors by dcache start-up script.
INFO:ftpBaseLinkedTo=/pnfs/psi.ch
INFO:pnfsMountPoint=/pnfs/psi.ch
INFO:Link /pnfs/ftpBase --> /pnfs/psi.ch already there.
WARNING:deleting previous version of Tomcat at /opt/d-cache/libexec/apache-tomcat-5.5.20
INFO:installing tomcat and axis ...
INFO:Done installing tomcat and axis
INFO:modifying java options in /opt/d-cache/libexec/apache-tomcat-5.5.20/bin/catalina.sh ...
INFO:modifying system CLASSPATH in /opt/d-cache/libexec/apache-tomcat-5.5.20/bin/setclasspath.sh ...
WARNING:Removing previous srm webapp directory
INFO:Creating srm webapp directory
INFO:Creating srm webapp deployment file
INFO:Done creating srm webapp deployment file
INFO:Starting up tomcat ...
Using CATALINA_BASE:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_HOME:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_TMPDIR: /opt/d-cache/libexec/apache-tomcat-5.5.20/temp
Using JRE_HOME:       /usr/java/jdk1.6.0_17
INFO:Done starting up tomcat
INFO:deploying srm v2 application using axis AdminClient ...
- Unable to find required classes (javax.activation.DataHandler and javax.mail.internet.MimeMultipart). Attachment support is disabled.
Processing file /opt/d-cache/etc/srmv1-deploy.wsdd
Done processing
- Unable to find required classes (javax.activation.DataHandler and javax.mail.internet.MimeMultipart). Attachment support is disabled.
Processing file /opt/d-cache/etc/srmv2.2-deploy.wsdd
Done processing
INFO:Done deploying srm v2 application using axis AdminClient
INFO:creating config files and adding configuration info into /opt/d-cache/srm-webapp/WEB-INF/web.xml ...
INFO:done creating config files and adding configuration info into /opt/d-cache/srm-webapp/WEB-INF/web.xml
INFO:enabling GSI HTTP in tomcat by modifying /opt/d-cache/libexec/apache-tomcat-5.5.20/conf/server.xml ...
INFO:commenting out HTTP Connector on port 8080 in /opt/d-cache/libexec/apache-tomcat-5.5.20/conf/server.xml ...
INFO:Done commenting out HTTP Connector on port 8080 in /opt/d-cache/libexec/apache-tomcat-5.5.20/conf/server.xml
INFO:commenting out AJP CoyoteConnector on port 8009...
INFO:Done commenting out AJP CoyoteConnector on port 8009
INFO:turning off sending of Multi Refs in /opt/d-cache/srm-webapp/WEB-INF/server-config.wsdd
INFO:Done turning off sending of Multi Refs in /opt/d-cache/srm-webapp/WEB-INF/server-config.wsdd
INFO:shutdown Tomcat
Using CATALINA_BASE:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_HOME:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_TMPDIR: /opt/d-cache/libexec/apache-tomcat-5.5.20/temp
Using JRE_HOME:       /usr/java/jdk1.6.0_17
INFO:installing config for startup/shutdown script
INFO:Installation complete
INFO:please use /opt/d-cache/bin/dcache start|stop|restart srm to startup, shutdown or restart srm server

t3dcachedb01

[root@t3dcachedb01 ~]# rpm -Uvh dcache-server-1.9.5-11.noarch.rpm
Preparing...                ########################################### [100%]
Cannot find appropriate pid file (/opt/d-cache/config/lastPid.pnfs)
   1:dcache-server          warning: /opt/d-cache/etc/srm_setup.env created as /opt/d-cache/etc/srm_setup.env.rpmnew
########################################### [100%]

Install the updated config files

  • /opt/d-cache/config/dCacheSetup
  • /opt/d-cache/etc/node_config

Start up postgres and pnfs

[root@t3dcachedb01 ~]# /etc/init.d/postgresql start
Starting postgresql service:                               [  OK  ]

[root@t3dcachedb01 ~]# /etc/init.d/pnfs start
Starting pnfs services (PostgreSQL version):
 Shmcom : Installed 8 Clients and 8 Servers
 Starting database server for admin (/opt/pnfsdb/pnfs/databases/admin) ... O.K.
 Starting database server for data1 (/opt/pnfsdb/pnfs/databases/data1) ... O.K.
 Starting database server for cms (/opt/pnfsdb/pnfs/databases/cms) ... O.K.
 Starting database server for ops (/opt/pnfsdb/pnfs/databases/ops) ... O.K.
 Starting database server for dteam (/opt/pnfsdb/pnfs/databases/dteam) ... O.K.
 Waiting for dbservers to register ... Ready
 Starting Mountd : pmountd
 Starting nfsd : pnfsd

Check that in the /opt/d-cache/etc/node_config the overwriting of PNFS is disabled!

PNFS_OVERWRITE=no

Run the installer script

[root@t3dcachedb01 ~]# /opt/d-cache/install/install.sh
INFO:This node will need to mount the name server.
INFO:Not an admin door inteface node

 Checking MasterSetup  ./config/dCacheSetup O.k.

   Sanning dCache batch files

    Processing adminDoor
    Processing authdoor
    Processing chimera
    Processing dCache
    Processing dir
    Processing door
    Processing gPlazma
    Processing gridftpdoor
    Processing gsidcapdoor
    Processing httpd
    Processing httpdoor
    Processing info
    Processing infoProvider
    Processing lm
    Processing nfsv41
    Processing pnfs
    Processing pool
    Processing replica
    Processing srm
    Processing statistics
    Processing utility
    Processing xrootdDoor


 Checking Users database .... Ok
 Checking Security       ....  server_key and/or host_key are missing

 Use following commands to generate them:
  cd ../config
  ssh-keygen -t rsa1 -b  768 -f ./server_key -N ""
  ssh-keygen -t rsa1 -b 1024 -f ./host_key   -N ""

 Checking JVM ........ Ok
 Checking Cells ...... Ok
 dCacheVersion ....... Version production-1.9.5-11

INFO:Will be mounted to t3dcachedb01.psi.ch:/fs by dcache start-up script.
INFO:Link /pnfs/psi.ch --> /pnfs/fs/usr already there.
INFO:Link /pnfs/ftpBase --> /pnfs/fs already there.
INFO:Checking on a possibly existing dCache/PNFS configuration ...
INFO:Found an existing dCache/PNFS configuration!
INFO:Not allowed to overwrite existing PNFS configuration.
INFO:There already are pnfs exports '/pnfsdoors' in
INFO: /pnfs/fs/admin/etc/exports. The GridFTP doors need access to it.
INFO:You may restrict access to this export to the GridFTP doors which
INFO:are not on the admin node. See the documentation.

Test start of the core services

t3dcachedb01:

[root@t3dcachedb01 dcache]# /opt/d-cache/bin/dcache start
Starting pnfsDomain 6 Done (pid=29774)

t3se01:

[root@t3se01 dcache]# /opt/d-cache/bin/dcache start
Mounting /pnfs/psi.ch
Starting lmDomain 6 Done (pid=9822)
Starting dCacheDomain 6 Done (pid=9900)
Starting dirDomain 6 Done (pid=9995)
Starting adminDoorDomain 6 Done (pid=10086)
Starting httpdDomain 6 Done (pid=10182)
Starting utilityDomain 6 Done (pid=10287)
Starting gPlazma-t3se01Domain 6 Done (pid=10380)
Starting infoDomain 6 Done (pid=10473)
Starting dcap-t3se01Domain 6 Done (pid=10635)
Starting gridftp-t3se01Domain 6 Done (pid=10732)
Starting gsidcap-t3se01Domain 6 Done (pid=10843)
Using CATALINA_BASE:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_HOME:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_TMPDIR: /opt/d-cache/libexec/apache-tomcat-5.5.20/temp
Using JRE_HOME:       /usr/java/jdk1.6.0_17

Pinging srm server to wake it up, will take few seconds ...
VersionInfo : v2.2
backend_type:dCache

Done

Install Solaris packages on the file servers

I move the old installations to a backup location

root@t3admin01 ~]# ssh t3fs01 "mv /opt/d-cache /opt/d-cache-1.9.5-11"

Install the package

root@t3fs01 # pkgadd -d dcache-server-1.9.5-11.pkg

The following packages are available:
  1  dCache     dCache Server
                (all) 1.9.5-11

Select package(s) you wish to process (or 'all' to process
all packages). (default: all) [?,??,q]:

Processing package instance  from 

dCache Server(all) 1.9.5-11
dCache.ORG""
Using  as the package base directory.
## Processing package information.
## Processing system information.
## Verifying disk space requirements.
## Checking for conflicts with packages already installed.

The following files are already installed on the system and are being
used by another package:
  /opt 

Do you want to install these conflicting files [y,n,?,q] y
## Checking for setuid/setgid programs.

Installing dCache Server as 

## Installing part 1 of 1.
/opt/d-cache/billing/README
/opt/d-cache/bin/dCacheConfigure.sh
/opt/d-cache/bin/dcache
/opt/d-cache/bin/dcache-srm
/opt/d-cache/bin/meta2yaml
...
...
/opt/d-cache/share/xml/xylophone/xsl/xylophone-path.xsl
/opt/d-cache/share/xml/xylophone/xsl/xylophone-predicate.xsl
/opt/d-cache/share/xml/xylophone/xsl/xylophone-publish.xsl
/opt/d-cache/share/xml/xylophone/xsl/xylophone-unique.xsl
/opt/d-cache/share/xml/xylophone/xsl/xylophone-user-elements.xsl
/opt/d-cache/share/xml/xylophone/xsl/xylophone.xsl
[ verifying class  ]

Installation of  was successful.

Install the updated config files

  • /opt/d-cache/config/dCacheSetup
  • /opt/d-cache/etc/node_config
  • /opt/d-cache/config/t3fs01.poollist (copy from previous installation. This contains the information on existing pools)

Run the dcache install script

root@t3fs01 # /opt/d-cache/install/install.sh 
....
ABORT:java variable in /opt/d-cache/config/dCacheSetup do not point to java version 1.6.x

Ouch... I found the following buried in an upgrade notes page:
As of version 1.9.4-1, dCache requires Java 6.

Java installation on t3fs01:

  • Download the appropriate jdk from the SUN home site (I got the self extracting installer jdk-6u17-solaris-i586.sh)
  • Go to the /usr dir and run the installer from there. You will now have java avalable under /usr/jdk1.6.0_17/bin/java

Change the entry in the pool node's dCacheSetup file (this is a bit disgruntling, since usually we tried to keep these config files identical over all nodes. However, I do not want to mess with the default installations of java on these systems that come through standard SUN packages. There was no upgrade package for java1.6, and I do not want to endanger other services running on these machines).

Run the dcache install script

root@t3fs01 # /opt/d-cache/install/install.sh
INFO:Not an admin door inteface node

 Checking MasterSetup  ./config/dCacheSetup O.k.

   Sanning dCache batch files

    Processing adminDoor
    Processing authdoor
    Processing chimera
    Processing dCache
    Processing dir
    Processing door
    Processing gPlazma
    Processing gridftpdoor
    Processing gsidcapdoor
    Processing httpd
    Processing httpdoor
    Processing info
    Processing infoProvider
    Processing lm
    Processing nfsv41
    Processing pnfs
    Processing pool
    Processing replica
    Processing srm
    Processing statistics
    Processing utility
    Processing xrootdDoor


 Checking Users database .... Ok
 Checking Security       ....  server_key and/or host_key are missing

 Use following commands to generate them:
  cd ../config
  ssh-keygen -t rsa1 -b  768 -f ./server_key -N ""
  ssh-keygen -t rsa1 -b 1024 -f ./host_key   -N ""

 Checking JVM ........ Ok
 Checking Cells ...... Ok
 dCacheVersion ....... Version production-1.9.5-11

Try to list the existing pools:

root@t3fs01 # /opt/d-cache/bin/dcache pool ls
Pool         Domain       LFS          Size   Free Path
t3fs01_ops   t3fs01Domain precious      250   5463 /data1/t3fs01_ops
t3fs01_cms   t3fs01Domain precious     7250   5463 /data1/t3fs01_cms
t3fs01_cms_1 t3fs01Domain precious     7500   5463 /data1/t3fs01_cms_1

Do a full test with a single pool server

It helps to move the old log files in /var/log/dcache away, so that one can concentrate on the new entries.

  1. Start the core services
  2. Start the pool service
    root@t3fs01 # /opt/d-cache/bin/dcache start
    Starting t3fs01Domain Done (pid=12351)
    Starting dcap-t3fs01Domain Done (pid=12401)
    Starting gridftp-t3fs01Domain 6 Done (pid=12454)
    Starting gsidcap-t3fs01Domain Done (pid=12507)
    
  3. Watch through the admin shell whether all the cells come up. Pay special attention to the pool's cells
    [feichtinger@t3ui01 ~]$ dc_get_routes.sh  | grep t3fs01
    Connection to t3se01.psi.ch closed by remote host.
      DCap-t3fs01         *          *@dcap-t3fs01Domain   Wellknown
      t3fs01_cms          *            *@t3fs01Domain      Wellknown
      GFTP-t3fs01         *        *@gridftp-t3fs01Domain  Wellknown
    DCap-gsi-t3fs01       *        *@gsidcap-t3fs01Domain  Wellknown
     t3fs01_cms_1         *            *@t3fs01Domain      Wellknown
      t3fs01_ops          *            *@t3fs01Domain      Wellknown
           *        t3fs01Domain      l-101-Unknown-116    Domain
           *       gridftp-t3fs01Domain    l-101-Unknown-118    Domain
           *       dcap-t3fs01Domain    l-101-Unknown-117    Domain
           *       gsidcap-t3fs01Domain    l-101-Unknown-119    Domain
    
  4. run our dcache test suite test-dCacheProtocols

For gsiftp wites, I get the following error

#############################################
TEST: GFTP-write

error: globus_ftp_client: the server responded with an error
451 Operation failed: FTP Door: got response from '[>PoolManager@dCacheDomain:*@dCacheDomain:SrmSpaceManager@srm-t3se01Domain:*@srm-t3se01
Domain:*@dCacheDomain]' with error No write pools configured for <cms:cms@osm>

srmls works:

srmls srm://t3se01.psi.ch:8443/srm/managerv2?SFN=/pnfs/psi.ch/cms/trivcat/store/user

srmcp from the SE to a local file also works

I started all pools, but always got the same error when attempting to write a file.

There have been changes in some config files, but I checked and our versions look ok for this release

There were some error messages in the gPlazma cell about DB operations. gPlazma now persists data in the DB. Don't know whether this has led to the errors, but the specific error message does not point to it.

NOTE: I found no way of getting writing to work, so the changes had to be rolled back.

Notes after bringing the system up

  • t3wn08: The eth0 interface could not be brought up (yellow blinking). May be broken. I had to reconfigure eth1 for the moment
  • t3fs02: This node came up with a broken network configuration, probably mainly due to strange entries in /etc/hosts
  • A number of services did not start due to missing chkconfig configuration.

Debugging on a virtual environment - 2010-02-19

dcache support

This is tracked as an official support request: www.dcache.org #5403

Setup

  • The environment consists of a head node t3se02 with all the core services + pnfsd + postgres and a Solaris10 based pool server t3fs12.
  • First, I installed dcache 1.9.2-5 on both nodes, and certified that I could write files
    ./test-dCacheProtocols.sh -n t3se02.psi.ch -p /pnfs/psi.ch/cms/derektest -i "GSIDCAP-write"
    Test directory: /tmp/dcachetest-20100219-1654-15186
    TEST: GSIDCAP-write ......  [IGNORE]
    TEST: SRMv1-adv-del ......  [SKIPPED] (dependencies did not run:  GSIDCAP-write)
    TEST: GFTP-write ......  [OK]
    TEST: GFTP-ls ......  [OK]
    TEST: GFTP-read ......  [OK]
    TEST: DCAP-read ......  [OK]
    TEST: SRMv1-adv-del1 ......  [OK]
    TEST: SRMv1-write ......  [OK]
    TEST: SRMv1-get-meta ......  [OK]
    TEST: SRMv1-read ......  [OK]
    TEST: SRMv1-adv-del2 ......  [OK]
    TEST: SRMv2-write ......  [OK]
    TEST: SRMv2-ls ......  [OK]
    TEST: SRMv2-read ......  [OK]
    TEST: SRMv2-rm ......  [OK]
    [feichtinger@t3ui01 dcache]$
    
  • I made a snapshot of the vmware machine t3se02 at this point, and I snapshotted the ZFS FS on t3fs12
  • Then I upgraded according to the procedure laid out on this page to 1.9.5-12. I got the same error on gridftp writes as above.

19 Feb 2010 15:13:37 (GFTP-t3se02-Unknown-104) [gridftp-t3se02Domain-1266588815422] FTP Door: Transfer error: 451 Operation failed: FTP Door: got
response from '[>PoolManager@dCacheDomain:*@dCacheDomain:SrmSpaceManager@srm-t3se02Domain:*@srm-t3se02Domain:*@dCacheDomain]' with error No write
pools configured for  for linkGroup: [none] (FTP Door: got response from '[>PoolManager@dCacheDomain:*@dCacheDomain:SrmSpaceManager@s
rm-t3se02Domain:*@srm-t3se02Domain:*@dCacheDomain]' with error No write pools configured for  for linkGroup: [none])

Hunting for solutions

Upgrading postgresql to 8.4 : bad idea... DB not compatible

First, need to dump the DB:

cd /var/lib/pgsql/backups/
pg_dumpall -U postgres > dcache-pgsql-20100219

mv /var/lib/pgsql /var/lib/pgsql-8.3.3.bup

Now, upgrade the RPMs:

*Link
http://yum.pgsqlrpms.org/howtoyum.php

Put in /etc/yum.repos.d/pgdg-84-redhat.repo:

[pgdg84]
name=PostgreSQL 8.4 $releasever - $basearch
baseurl=http://yum.pgsqlrpms.org/8.4/redhat/rhel-4-$basearch
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-PGDG

[pgdg84-source]
name=PostgreSQL 8.4 $releasever - $basearch - Source
failovermethod=priority
baseurl=http://yum.pgsqlrpms.org/srpms/8.4/redhat/rhel-4-$basearch
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-PGDG

I started out with these versions

[root@t3se02 dcache]# rpm -qa| grep postgre
postgresql-libs-8.3.3-1PGDG.rhel4.x86_64
compat-postgresql-libs-4-1PGDG.rhel4.i686
pnfs-postgresql-3.1.10-7.i386
postgresql-8.3.3-1PGDG.rhel4.x86_64
postgresql-server-8.3.3-1PGDG.rhel4.x86_64

Upgrade:

yum install postgresql.x86_64 postgresql-server.x86_64 compat-postgresql-libs.x86_64 postgresql-libs.x86_64

Installing:
 compat-postgresql-libs  x86_64     4-1PGDG.rhel4    pgdg84             61 k
Updating:
 postgresql              x86_64     8.4.2-1PGDG.rhel4  pgdg84            1.3 M
 postgresql-libs         x86_64     8.4.2-1PGDG.rhel4  pgdg84            181 k
 postgresql-server       x86_64     8.4.2-1PGDG.rhel4  pgdg84            4.6 M

Initialize the DB

service postgresql initdb

Restore your previous pg_hba.conf and any postgresql.conf modifications

Start the DB

/etc/init.d/postgresql start

The database backup must be loaded as user postgres (only he is permitted at this stage):

su - postgres
-bash-3.00$ psql -d postgres -f /var/lib/pgsql-8.3.3.bup/backups/dcache-pgsql-20100219

PNFS could not be started. There seem to be some major problems with this DB version.

Trying to debug Space Manager and data base statements 2010-03-03

[t3se02.psi.ch] (local) admin > cd SrmSpaceManager
[t3se02.psi.ch] (SrmSpaceManager) admin > info
space.Manager SrmSpaceManager
spaceManagerEnabled=true
JdbcUrl=jdbc:postgresql://t3se02.psi.ch/dcache
jdbcClass=org.postgresql.Driver
databse user=srmdcache
reservation space cleanup period in secs : 3600
updateLinkGroupsPeriod=180000
expireSpaceReservationsPeriod=180000
deleteStoredFileRecord=false
pnfsManager=PnfsManager
poolManager=PoolManager
defaultLatencyForSpaceReservations=ONLINE
reserveSpaceForNonSRMTransfers=false 
returnFlushedSpaceToReservation=true
returnRemovedSpaceToReservation=true
linkGroupAuthorizationFileName=null

The two marked lines differ in respect to our running 1.9.2-5 and to the new dcache-server-1.9.5-13 at CSCS.

I turns out that the following directive must be given explicitely!!!!!! The commented out default is not taken and no log message about not having this information is in the logs!!!!

SpaceManagerLinkGroupAuthorizationFileName=/opt/d-cache/etc/LinkGroupAuthorization.conf

The ~reserveSpaceForNonSRMTransfers must also be set to allow for non-SRM transfers.

UpgradePlanningForm
Title dcache upgrade from 1.9.2-5 to 1.9.5-11 (failed), PSI power shutdown
Summary Upgrade to the "golden" dcache version. Checks for improving authz configuration. The upgrade failed and had to be rolled back. On Jan 09 the computing center will be powered down. All our servers need to be off.
Target Date 08. 01. 2010
Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2010-03-03 - DerekFeichtinger
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback