Tags:
dcache1Add my vote for this tag create new tag
view all tags

dCache Full Restart procedure

Some times there is a need to restart dCache from zero, most likely because you don't know how to start it up. This document shows how to bring the service back after a full reboot

The central piece is the LM domain that starts in storage02, when you do a service dcache start. Afterwards, any other domain on any other host will connect to the LM, that will make it reachable inside dCache. If you restart the LM, all the other domains will try to re-contact it after some seconds, and will try to do so for some minutes... but don't expect them to retry too much.

Step 1. Reboot all machines

The dCache service does not come up automatically, so a good starting point is having all machines just rebooted.

Step 2. Start dCache on storage02

You can probably go straight to the last step, but if you're not an expert probably yo want to go all items here:
  • Postgres should run automatically in the bootup. Make sure /var/lib/pgsql is mounted and postgresql is running.
  • chimera-nfs service also automatically runs upon boot... but if you want to make sure run ps -axf | grep "chimera/" and see if you get a java process with a lot of includes.
  • If you have upgraded the dCache RPM or even the Java RPM, you need to do a /opt/d-cache/install/install.sh before running the dCache service. It doesn't hurt if you do this in any case.
  • Finally, do a service dcache start

Step 3. Start dCache on the rest of the systems

This step depends on the kind of machine you want to join dCache. Storage01, Linux pools and Solaris pools have a different way to enable dCache (but at the end, though, it's just the same thing: starting the dcache init script. Let's go type by type.

Storage01

You can probably go straight to the last step, but if you're not an expert probably yo want to go all items here:
  • Postgres should run automatically in the bootup. Make sure /var/lib/pgsql is mounted and postgresql is running.
  • Also, the billing file database is in a separate mount point. Make sure /opt/d-cache/billing is mounted.
  • If you have upgraded the dCache RPM or even the Java RPM, you need to do a /opt/d-cache/install/install.sh before running the dCache service. It doesn't hurt if you do this in any case.
  • Finally, do a service dcache start

Thumpers (pools with Solaris)

You can probably go straight to the last step, but if you're not an expert probably yo want to go all items here:
  • Check the data mountpoints are available zfs list
  • Make sure gmond is running, it does not like to run by default. Check with ps -e | grep gmond and if not present, do a svcadm clear gmond and check again.
  • If you have upgraded the dCache or even the Java packages, you need to do a /opt/d-cache/install/install.sh before running the dCache service. It doesn't hurt if you do this in any case.
  • Finally, do a /opt/d-cache/bin/dcache start (from xen12, or any machine with dsh groups configured, perform dsh -g SE '/opt/d-cache/bin/dcache start')

Thors (pools with Linux)

You can probably go straight to the last step, but if you're not an expert probably yo want to go all items here:
  • Check the data mountpoints are available df -h | grep data1
  • Make the logging directory: mkdir /var/log/d-cache
  • If you have upgraded the dCache RPM or even the Java RPM, you need to do a /opt/d-cache/install/install.sh before running the dCache service. It doesn't hurt if you do this in any case.
  • Finally, do a service dcache start (from xen12, or any machine with dsh groups configured, perform dsh -g SE2 'service dcache start')

Step 4. Test the system

From a user interface try to use these two test scripts, after creating your voms proxy:
  • chk_SE-dcache. This checks the basic direct commands that you can perform to dCache. If you get a "no space available" you probably need to wait a bit until pools start up and publish their free space to the dCache core.
  • chk_SE-lcgtools. This uses the complete information system to perform a copy (it's the kind of tool the jobs actually use!!). It has a lot of options, but it should run OK without parameters. If it doesn't, the first think to see is if the information system has published everythink upwards. You can use a different BDII to speed things up, with the parameter "-b bdii.lcg.cscs.ch" for example.
  • Check for offline pools. http://storage01.lcg.cscs.ch:2288/usageInfo should show all pools (check they're all there) and if there is no OFFLINE word in big red letters you're probably fine.
If all this works, you're done. Congratulations!!!

-- PabloFernandez - 2010-11-08

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2011-05-23 - PabloFernandez
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback