Node Type: dCacheSolaris
Firewall requirements
local port |
open to |
reason |
2811/tcp |
* |
gridftp control connection |
22125/tcp |
192.33.123.0/24 |
unauthenticated dcap (read only) |
22128/tcp |
192.33.123.0/24 |
gsidcap (GSI authenticated dcap) |
20000-25000/tcp |
* |
Globus port range for gridftp/xrootd data streams |
Regular Maintenance work
Emergency Measures
Broken 16GB Compact Flash Card
- Fabio left an already installed and fully tested
t3fs10
Compact Flash Card placed inside the X4540
installed above t3fs11
; simply take it and use it to recover the failed X4540;
- you'll have to delete the related Puppet keys from
psi-puppet3.psi.ch
and to run Puppet on the restored X4540
to get installed the correct X509 cert and key, or simply copy them from t3admin01:/root/clusteradmin/etc/hostkeys/switch-QuoVadis
- dCache won't start automatically, you have to start it by
dcache start
Broken 16GB Compact Flash Card - We have to reinstall Solaris 10
Example of 16GB Flash Card
- If you're in emergency then stop Nagios to avoid to get too many fake e-mails
ssh root@t3nagios /etc/init.d/nagios stop
.
- If the
16GB Compact Flash Card
is broken:
- in the 3 spares X4540 servers mounted in our last rack there are 3
16GB Compact Flash Card
; also in the AIT warehouse close to the Derek's office there are 2 16GB Compact Flash Card
placed inside the spares X4540 there stored.
- once you will have inserted the new
16GB Compact Flash Card
you have to use the Solaris VM t3jumpstart
to reinstall from scratch Solaris 10 1/13
and configure it automatically in the shape described by the following chapters. Remember that you can use the VM t3fs15
to quickly test the Solaris 10 1/13
installation executed by t3jumpstart
, by doing that you can both validate the installation procedure and avoid to write too much onto the target 16GB Compact Flash Card
that's actually supposed to be seldomly written!! Furthermore, it could be that by inserting a new 16GB Compact Flash Card
into the X4540
the boot disks are going to be reshaffled in the BIOS!. That will prevent to complete the Solaris 10 1/13
installation, Fabio got this case. If that happens rebot and enter into the server BIOS and reorder the boot disks, with the Compact Flash Card placed as the 1st boot device.
- If doable then
zfs export data1
from the failed Solaris installation before to zfs import data1
into the new Solaris installation; anyhow you can always force the zfs import
- Always inform the users by sending an e-mail to
cms-tier3-users@lists.psi.ch
; if you want to produce the list of files affected you can use the v_pnfs views made by Fabio.
- If you accidentaly altered/erased a Solaris file located in
/
then maybe you could recover it either by a puppetd -t -v = run or by searching for it among the =zfs list
snapshots.
Broken HW ( e.g. a 1TB disk )
[root@t3fs08 ~]# fmadm faulty
will tell you; Nagios has a check based on that.
Installation - Example For t3fs08
Solaris 10 1/13 installation
Installation is described in Puppet by
tier3-baseclasses.pp
plus
Sol10_fs26.pp
; Fabio uses these alias and Puppet recipes are in
puppetdirnodes
; Solaris files are in
puppetdirsolaris
alias dcache='ssh -2 -l admin -p 22224 t3dcachedb.psi.ch'
alias kscustom57='cd /afs/psi.ch/software/linux/dist/scientific/57/custom'
alias kscustom64='cd /afs/psi.ch/software/linux/dist/scientific/64/custom'
alias ksdir='cd /afs/psi.ch/software/linux/kickstart/configs'
alias puppetdir='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/'
alias puppetdirnodes='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/nodes'
alias puppetdirredhat='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/RedHat'
alias puppetdirsolaris='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/Solaris/5.10'
alias yumdir5='cd /afs/psi.ch/software/linux/dist/scientific/57/scripts'
alias yumdir6='cd /afs/psi.ch/software/linux/dist/scientific/6/scripts'
Remember to erase the existing Puppet keys associated to the X4540 that you're reinstalling from scratch! e.g.:
$ ssh -XY martinelli_f@psi-puppet3.psi.ch
[martinelli_f@psi-puppet3 ~]$ sudo /usr/sbin/puppetca --clean t3fs08.psi.ch
t3fs08.psi.ch
notice: Removing file Puppet::SSL::Certificate t3fs08.psi.ch at '/var/puppet/ssl/ca/signed/t3fs08.psi.ch.pem'
If everything work as designed you just have to follow the instructions on
NodeTypeJumpStart to reinstall a X4540.
Once
Solaris 10 1/13
will be installed onto the new
16GB Compact Flash Card
tune ZFS by:
zfs set atime=off rpool
zfs set sync=always rpool
SSHd keys
Puppet will upload the previous server SSH keys, this will avoid the ssh clients complaints.
We prevent SSH logins from unauthorized hosts.
cat /etc/hosts.allow
# Puppet Managed File
sshd: t3admin01.psi.ch fabiom-mac.psi.ch wmgt01.psi.ch wmgt02.psi.ch dflt1w.psi.ch localhost t3ossec.psi.ch t3nagios.psi.ch t3fs01.psi.ch t3fs02.psi.ch t3fs03.psi.ch t3fs04.psi.ch t3fs07.psi.ch t3fs08.psi.ch t3fs09.psi.ch t3fs10.psi.ch t3fs11.psi.ch
ZFS setup for data1 partition
Warning ! Warning !
CREATE data1 ONLY IF data1 DOESN'T EXIST !
IN REAL LIFE data1 WILL ALREADY EXIST SO RUN zpool import data1 INSTEAD AND NEITHER CREATE data1 NOR ALTER ITS ZFS PROPERTIES !!
zpool create -f data1 raidz2 c1t0d0 c1t5d0 c2t2d0 c2t7d0 c3t4d0 c4t1d0 c4t6d0 c5t3d0 c6t0d0
zpool add -f data1 raidz2 c1t1d0 c1t6d0 c2t3d0 c3t0d0 c3t5d0 c4t2d0 c4t7d0 c5t4d0 c6t1d0
zpool add -f data1 raidz2 c1t2d0 c1t7d0 c2t4d0 c3t1d0 c3t6d0 c4t3d0 c5t0d0 c5t5d0 c6t2d0
zpool add -f data1 raidz2 c1t3d0 c2t0d0 c2t5d0 c3t2d0 c3t7d0 c4t4d0 c5t1d0 c5t6d0 c6t3d0
zpool add -f data1 raidz2 c1t4d0 c2t1d0 c2t6d0 c3t3d0 c4t0d0 c4t5d0 c5t2d0 c5t7d0 c6t4d0
zpool add -f data1 spare c6t7d0 c6t6d0 c6t5d0
# ZFS tuning
zfs create data1/t3fs08_cms
zfs create data1/t3fs08_ops
zfs set quota=30TB data1/t3fs08_cms
zfs set quota=1GB data1/t3fs08_ops
zfs set recordsize=1024K data1
zfs set devices=off data1
zfs set atime=off data1
zfs set exec=on data1 # to avoid an additional stress to the weak Compact Flash Cards then you'll want to relocate /opt/csw into /data1 and put exec=on ; otherwise it's safer exec=off because /data1 shouldn't contains executables
GRID PKI infrastructure
A Puppet run will upload:
-
/etc/grid-security/hostcert.pem
-
/etc/grid-security/hostkey.pem
- and the tool
/opt/fetch-crl/fetch-crl
needed to daily updated the lcg-CA CRLs.
just the first time, to upload the
lcg-CA
files into
/etc/grid-security/certificates
please connect to
t3admin01
and run
/root/clusteradmin/sync_cacerts_tofs.singlet3fs.sh t3fs08
The CA CRL files transfered from
t3admin01
will be updated because on
t3admin01
there is a cron that regularly refresh them and a Nagios check that check this 'freshness' ; following this first manual upload there is a
root
crontab created by Puppet that will invoke daily
/opt/fetch-crl/fetch-crl
on
t3fs08
; that's showed below in this page.
NTP time server
Make sure that the time service is running correctly;
t3nagios
will constantly check that; the automatic
Solaris 10 1/13
installation made by
t3jumpstart
will take care of
ntp
:
-bash-3.2# svcs ntp
STATE STIME FMRI
online Oct_02 svc:/network/ntp:default
The configuration file for
xntpd
is found at
/etc/inet/ntp.conf
:
More... Close
-bash-3.2# cat /etc/inet/ntp.conf
# NOTE: This file is managed through puppet
# If you edit this file locally, it will be replaced in
# the next puppet run
#
# File is located at
# $Id: NodeTypeFileServer.txt,v 1.49 2015/06/01 09:56:11 fabiom Exp $
# $URL: svn+ssh://savannah01.psi.ch/repos/tier3/tier3/puppet/TRUNK/modules/Tier3/files/Solaris/5.10/etc/inet/ntp.conf $
# as produced by a fresh Solaris 10 jumpstart install
#server 192.33.126.10 prefer
driftfile /var/ntp/ntp.drift
statsdir /var/ntp/ntpstats
filegen peerstats file peerstats type day enable
filegen loopstats file loopstats type day enable
filegen clockstats file clockstats type day enable
server dmztime1.psi.ch
restrict dmztime1.psi.ch noquery nomodify
server dmztime2.psi.ch
restrict dmztime2.psi.ch noquery nomodify
Java JDK
JDK7
is a requirement for
dCache 2.6
; the automatic
Solaris 10 1/13
installation performed by
t3jumpstart
will take care of
JDK7
:
-bash-3.2# which java
/usr/bin/java
-bash-3.2# ls -l /usr/bin/java
lrwxrwxrwx 1 root other 16 Oct 2 15:57 /usr/bin/java -> ../java/bin/java
-bash-3.2# ls -l /usr/java
lrwxrwxrwx 1 root root 10 Oct 2 16:37 /usr/java -> jdk/latest
-bash-3.2# ls -l /usr/java/jdk
/usr/java/jdk: No such file or directory
-bash-3.2# ls -l /usr/jdk/
total 11
drwxr-xr-x 5 root bin 5 Oct 2 16:33 instances
lrwxrwxrwx 1 root other 7 Oct 2 15:58 j2sdk1.4.2_34 -> ../j2se
lrwxrwxrwx 1 root other 18 Oct 2 15:58 jdk1.5.0_32 -> instances/jdk1.5.0
lrwxrwxrwx 1 root other 18 Oct 2 15:57 jdk1.6.0_37 -> instances/jdk1.6.0
lrwxrwxrwx 1 root other 18 Oct 2 16:35 jdk1.7.0_40 -> instances/jdk1.7.0
lrwxrwxrwx 1 root other 11 Oct 2 16:35 latest -> jdk1.7.0_40
drwxr-xr-x 4 root bin 4 Oct 2 15:58 packages
Configure sensible system limits ??
Not sure if this is still needed nowadays: it is on =t3fs13,14=.
On default Solaris10 installations the soft limit for the maximal number of open file descriptors by a process is only 256! This must be raised for dcache. One convenient way to do it is to put this into
/opt/d-cache/jobs/dcache.local.sh
, a file which is sourced in the dcache start process. This is set by puppet for our servers.
ulimit -Sn 32768
pkgutil
The Solaris 10 1/13
installation made by t3jumpstart
will automatically take care of the pkgutil + useful pkgs.
http://www.opencsw.org/package/pkgutil/ is a must on Solaris to use your daily Linux tools also in Solaris.
If you need to install the pkgs by hand then this is the list:
/opt/csw/bin/pkgutil -i -y CSWnagiosp
/opt/csw/bin/pkgutil -i -y CSWnrpe
/opt/csw/bin/pkgutil -i -y CSWruby
/opt/csw/bin/pkgutil -i -y CSWsmartmontools
/opt/csw/bin/pkgutil -i -y CSWwatch
/opt/csw/bin/pkgutil -i -y CSWpstree
/opt/csw/bin/pkgutil -i -y CSWtop
/opt/csw/bin/pkgutil -i -y CSWiftop
/opt/csw/bin/pkgutil -i -y CSWnfswatch
/opt/csw/bin/pkgutil -i -y CSWnano
/opt/csw/bin/pkgutil -i -y CSWalternatives
/opt/csw/bin/pkgutil -i -y CSWaudiofile
/opt/csw/bin/pkgutil -i -y CSWaugeas
/opt/csw/bin/pkgutil -i -y CSWbash
/opt/csw/bin/pkgutil -i -y CSWbdb47
/opt/csw/bin/pkgutil -i -y CSWbdb48
/opt/csw/bin/pkgutil -i -y CSWbonobo2
/opt/csw/bin/pkgutil -i -y CSWbzip2
/opt/csw/bin/pkgutil -i -y CSWcacertificates
/opt/csw/bin/pkgutil -i -y CSWcas-cpsampleconf
/opt/csw/bin/pkgutil -i -y CSWcas-cptemplates
/opt/csw/bin/pkgutil -i -y CSWcas-crontab
/opt/csw/bin/pkgutil -i -y CSWcas-etcservices
/opt/csw/bin/pkgutil -i -y CSWcas-etcshells
/opt/csw/bin/pkgutil -i -y CSWcas-inetd
/opt/csw/bin/pkgutil -i -y CSWcas-initsmf
/opt/csw/bin/pkgutil -i -y CSWcas-migrateconf
/opt/csw/bin/pkgutil -i -y CSWcas-postmsg
/opt/csw/bin/pkgutil -i -y CSWcas-preserveconf
/opt/csw/bin/pkgutil -i -y CSWcas-pycompile
/opt/csw/bin/pkgutil -i -y CSWcas-texinfo
/opt/csw/bin/pkgutil -i -y CSWcas-usergroup
/opt/csw/bin/pkgutil -i -y CSWcommon
/opt/csw/bin/pkgutil -i -y CSWcoreutils
/opt/csw/bin/pkgutil -i -y CSWcswclassutils
/opt/csw/bin/pkgutil -i -y CSWdbusglib
/opt/csw/bin/pkgutil -i -y CSWelinks
/opt/csw/bin/pkgutil -i -y CSWemacs
/opt/csw/bin/pkgutil -i -y CSWemacsbincommon
/opt/csw/bin/pkgutil -i -y CSWemacschooser
/opt/csw/bin/pkgutil -i -y CSWemacscommon
/opt/csw/bin/pkgutil -i -y CSWesound
/opt/csw/bin/pkgutil -i -y CSWexpat
/opt/csw/bin/pkgutil -i -y CSWfconfig
/opt/csw/bin/pkgutil -i -y CSWfindutils
/opt/csw/bin/pkgutil -i -y CSWfontconfig
/opt/csw/bin/pkgutil -i -y CSWfreeglut
/opt/csw/bin/pkgutil -i -y CSWftype2
/opt/csw/bin/pkgutil -i -y CSWgawk
/opt/csw/bin/pkgutil -i -y CSWgcc3corert
/opt/csw/bin/pkgutil -i -y CSWgconf2
/opt/csw/bin/pkgutil -i -y CSWgcpio
/opt/csw/bin/pkgutil -i -y CSWgcrypt
/opt/csw/bin/pkgutil -i -y CSWgdbm
/opt/csw/bin/pkgutil -i -y CSWgdkpixbuf
/opt/csw/bin/pkgutil -i -y CSWggettext
/opt/csw/bin/pkgutil -i -y CSWggettext-data
/opt/csw/bin/pkgutil -i -y CSWggettextrt
/opt/csw/bin/pkgutil -i -y CSWggrep
/opt/csw/bin/pkgutil -i -y CSWgio-fam-backend
/opt/csw/bin/pkgutil -i -y CSWgit
/opt/csw/bin/pkgutil -i -y CSWgit-emacs
/opt/csw/bin/pkgutil -i -y CSWgit-gui
/opt/csw/bin/pkgutil -i -y CSWglib2
/opt/csw/bin/pkgutil -i -y CSWgnomekeyring
/opt/csw/bin/pkgutil -i -y CSWgnomevfs2
/opt/csw/bin/pkgutil -i -y CSWgnupg
/opt/csw/bin/pkgutil -i -y CSWgpg-error
/opt/csw/bin/pkgutil -i -y CSWgpgerr
/opt/csw/bin/pkgutil -i -y CSWgsed
/opt/csw/bin/pkgutil -i -y CSWgtar
/opt/csw/bin/pkgutil -i -y CSWgtk2
/opt/csw/bin/pkgutil -i -y CSWgtk2-printbackends-file
/opt/csw/bin/pkgutil -i -y CSWgtk2-printbackends-papi
/opt/csw/bin/pkgutil -i -y CSWgvim
/opt/csw/bin/pkgutil -i -y vim
/opt/csw/bin/pkgutil -i -y CSWgzip
/opt/csw/bin/pkgutil -i -y CSWhicoloricontheme
/opt/csw/bin/pkgutil -i -y CSWiconv
/opt/csw/bin/pkgutil -i -y CSWiftop
/opt/csw/bin/pkgutil -i -y CSWiozone
/opt/csw/bin/pkgutil -i -y CSWipython
/opt/csw/bin/pkgutil -i -y CSWisaexec
/opt/csw/bin/pkgutil -i -y CSWjbigkit
/opt/csw/bin/pkgutil -i -y CSWjpeg
/opt/csw/bin/pkgutil -i -y CSWkrb5lib
/opt/csw/bin/pkgutil -i -y CSWlsof
# Perl
/opt/csw/bin/pkgutil -i -y CSWpm-compress-raw-bzip2
/opt/csw/bin/pkgutil -i -y CSWpm-compress-raw-zlib
/opt/csw/bin/pkgutil -i -y CSWpm-html-parser
/opt/csw/bin/pkgutil -i -y CSWpm-html-tagset
/opt/csw/bin/pkgutil -i -y CSWpm-io-compress
/opt/csw/bin/pkgutil -i -y CSWpm-libwww-perl
/opt/csw/bin/pkgutil -i -y CSWpm-mime-base64
/opt/csw/bin/pkgutil -i -y CSWpm-uri
/opt/csw/bin/pkgutil -i -y CSWpmbutils
/opt/csw/bin/pkgutil -i -y CSWpmdatemanip
/opt/csw/bin/pkgutil -i -y CSWpmfontafm
/opt/csw/bin/pkgutil -i -y CSWpmhtmlfmt
/opt/csw/bin/pkgutil -i -y CSWpmhtmlformat
/opt/csw/bin/pkgutil -i -y CSWpmhtmlparser
/opt/csw/bin/pkgutil -i -y CSWpmhtmltagset
/opt/csw/bin/pkgutil -i -y CSWpmhtmltree
/opt/csw/bin/pkgutil -i -y CSWpmiocompress
/opt/csw/bin/pkgutil -i -y CSWpmmimebase64
/opt/csw/bin/pkgutil -i -y CSWpmuri
Cron Jobs
We regularly update the
/etc/grid-security/certificates
folder by using the tool
/opt/fetch-crl/fetch-crl
:
More... Close
crontab -l
# HEADER: This file was autogenerated at Wed Oct 02 16:38:05 +0200 2013 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
#ident "@(#)root 1.21 04/03/23 SMI"
#
# The root crontab should be used to perform accounting data collection.
#
#
10 3 * * * /usr/sbin/logadm
15 3 * * 0 /usr/lib/fs/nfs/nfsfind
30 3 * * * [ -x /usr/lib/gss/gsscred_clean ] && /usr/lib/gss/gsscred_clean
#
# The rtc command is run to adjust the real time clock if and when
# daylight savings time changes.
#
1 2 * * * [ -x /usr/sbin/rtc ] && /usr/sbin/rtc -c > /dev/null 2>&1
43 3 * * * [ -x /opt/csw/bin/gupdatedb ] && /opt/csw/bin/gupdatedb --prunepaths="/dev /devices /proc /tmp /var/tmp" 1>/dev/null 2>&1 # Added by CSWfindutils
# Puppet Name: fetch-crl
10 22 * * * /opt/fetch-crl/fetch-crl -c /opt/fetch-crl/fetch-crl.cnf -v 2>&1 | /usr/bin/tee /var/cron/fetch-crl.log 2>&1
dCache 2.6 - CSCS page
LCGTier2/ServiceDcache
dCache 2.6
A MUST NOTE: dCache runs as the user dcache
not anymore as the user root
so you might be hit by a permission denied.
dCache package:
More... Close
-bash-3.2# pkginfo -l dCache
PKGINST: dCache
NAME: dCache Server
CATEGORY: application
ARCH: all
VERSION: 2.6.16-1
BASEDIR: /
VENDOR: ${vendor}
PSTAMP: ${vendor}
INSTDATE: Nov 21 2013 10:08
EMAIL: support@dcache.org
STATUS: completely installed
FILES: 560 installed pathnames
10 shared pathnames
72 directories
5 executables
170729 blocks used (approx)
Pools
A Puppet run will upload setup files, will create dirs and groups and will assign the right dirs modes.
It's easy to make a pool, you just have to:
- define the pool in
/etc/dcache/layouts/t3fs08.conf
- assign each dir/files to the
dcache
user
- install a
setup
file like this
- you simply run
/usr/bin/dcache start
and all the needed dirs and files will be created into that pool dir
Important files in a nutshell
find /etc/dcache/
/etc/dcache/dcache.conf <-- main dCache conf, it should be the same on each node, if not it could be because the node needs more RAM for dCache.
/etc/dcache/logback.xml <-- to tune the logging verbosity, on SUN X4540 it should always be to the level 'error' to avoid to write too much on the Unigen 16GB Flash Card
/etc/dcache/layouts
/etc/dcache/layouts/t3fs08.conf <-- specific node conf.
# dCache optional plugins
/usr/local/share/dcache/plugins
# dCache Logs
/var/log/dcache/
/var/log/dcache/t3fs08-Domain-gsidcap.log
/var/log/dcache/t3fs08-Domain-pool.log
/var/log/dcache/t3fs08-Domain-dcap.log
/var/log/dcache/t3fs08-Domain-gridftp.log
# dCache GSI layer
root@t3fs01 $ ls -l /etc/grid-security/
total 267
drwxr-xr-x 2 root root 1202 Nov 26 13:58 certificates
-rw-r--r-- 1 dcache root 1880 Apr 5 2013 hostcert.pem
-rw-r--r-- 1 root root 1872 May 26 2011 hostcert.pem-20110526-1358
-rw-r--r-- 1 root root 1896 May 26 2011 hostcert.pem-20120504-1040
-rw-r--r-- 1 dcache root 1880 May 3 2012 hostcert.pem-20130405
-r-------- 1 dcache root 1679 Jul 21 2009 hostkey.pem
drwxr-x--- 2 root nagios 3 Nov 20 13:53 nagios <----- hostcert.pem -> /etc/grid-security/hostcert.pem
# Nagios checks
root@t3fs01 $ find /opt/csw/etc/nrpe.cfg.d/
/opt/csw/etc/nrpe.cfg.d/
/opt/csw/etc/nrpe.cfg.d/check_file_age_cern_crl.cfg
/opt/csw/etc/nrpe.cfg.d/check_X509.cfg
/etc/dcache/dcache.conf
The same as
NodeTypeStorageElement#etc_dcache_dcache_conf
/etc/dcache/layouts/t3fs08.conf
More... Close
-bash-3.2# cat /etc/dcache/layouts/t3fs08.conf
# Puppet Managed File
[${host.name}-Domain-pool]
[${host.name}-Domain-pool/pool]
name=t3fs08_cms
path=/data1/t3fs08_cms/pool
waitForFiles=${path}/data
#[${host.name}-Domain-pool/pool]
#name=t3fs08_cms_1
#path=/data1/t3fs08_cms_1/pool
[${host.name}-Domain-pool/pool]
name=t3fs08_ops
path=/data1/t3fs08_ops/pool
waitForFiles=${path}/data
[${host.name}-Domain-dcap]
[${host.name}-Domain-dcap/dcap]
[${host.name}-Domain-gridftp]
[${host.name}-Domain-gridftp/gridftp]
[${host.name}-Domain-gsidcap]
[${host.name}-Domain-gsidcap/gsidcap]
/data1/t3fs08_cms/pool/setup
More... Close
-bash-3.2# ls -l /data1/t3fs08_cms/pool/setup
-rw-r----- 1 dcache cms 1201 Oct 8 15:00 /data1/t3fs08_cms/pool/setup
-bash-3.2# cat /data1/t3fs08_cms/pool/setup
#
# Created by t3fs13_cms_2(Pool) at Wed Mar 28 14:28:45 CEST 2012
#
csm set checksumtype ADLER32
csm set policy -frequently=off
csm set policy -onread=off -onwrite=on -onrestore=off -ontransfer=off -enforcecrc=on -getcrcfromhsm=off
#
# Flushing Thread setup
#
flush set max active 1000
flush set interval 60
flush set retry delay 60
#
# HsmStorageHandler2(org.dcache.pool.classic.HsmStorageHandler2)
#
rh set max active 0
st set max active 0
rm set max active 1
rh set timeout 14400
st set timeout 14400
rm set timeout 14400
jtm set timeout -queue=p2p -lastAccess=0 -total=0
jtm set timeout -queue=default -lastAccess=0 -total=0
jtm set timeout -queue=wan -lastAccess=0 -total=0
jtm set timeout -queue=io -lastAccess=0 -total=0
set heartbeat 30
set report remove on
set breakeven 0.7
set gap 4g
set duplicate request none
set p2p separated
#
# Flushing Thread setup
#
flush set max active 1000
flush set interval 60
flush set retry delay 60
mover set max active -queue=default 100
mover set max active -queue=wan 2
p2p set max active 10
#
# MigrationModule
#
#
# Pool to Pool (P2P) [$Revision: 1.49 $]
#
pp set port 0
pp set max active 10
pp set pnfs timeout 300
set max diskspace 32212254635000
/data1/t3fs08_ops/pool/setup
More... Close
#
# Created by t3fs13_cms_2(Pool) at Wed Mar 28 14:28:45 CEST 2012
#
csm set checksumtype ADLER32
csm set policy -frequently=off
csm set policy -onread=off -onwrite=on -onrestore=off -ontransfer=off -enforcecrc=on -getcrcfromhsm=off
#
# Flushing Thread setup
#
flush set max active 1000
flush set interval 60
flush set retry delay 60
#
# HsmStorageHandler2(org.dcache.pool.classic.HsmStorageHandler2)
#
rh set max active 0
st set max active 0
rm set max active 1
rh set timeout 14400
st set timeout 14400
rm set timeout 14400
jtm set timeout -queue=p2p -lastAccess=0 -total=0
jtm set timeout -queue=default -lastAccess=0 -total=0
jtm set timeout -queue=wan -lastAccess=0 -total=0
jtm set timeout -queue=io -lastAccess=0 -total=0
set heartbeat 30
set report remove on
set breakeven 0.7
set gap 4g
set duplicate request none
set p2p separated
#
# Flushing Thread setup
#
flush set max active 1000
flush set interval 60
flush set retry delay 60
mover set max active -queue=default 100
mover set max active -queue=wan 2
p2p set max active 10
#
# MigrationModule
#
#
# Pool to Pool (P2P) [$Revision: 1.49 $]
#
pp set port 0
pp set max active 10
pp set pnfs timeout 300
set max diskspace 900000000
xrootd files opened monitoring
For each file opened by xrootd dCache sends a message to
http://xrootd.t2.ucsd.edu
[root@t3fs08 ~]# find /usr/local/share/dcache/plugins
/usr/local/share/dcache/plugins
/usr/local/share/dcache/plugins/monitor-5.0.0
/usr/local/share/dcache/plugins/monitor-5.0.0/logback-core-1.0.9.jar
/usr/local/share/dcache/plugins/monitor-5.0.0/logback-classic-1.0.9.jar
/usr/local/share/dcache/plugins/monitor-5.0.0/myplugin.properties
/usr/local/share/dcache/plugins/monitor-5.0.0/monitor-5.0.0.jar
/usr/local/share/dcache/plugins/monitor-5.0.0/README.md
/usr/local/share/dcache/plugins/monitor-5.0.0/LICENSE.txt
$ cat /usr/local/share/dcache/plugins/monitor-5.0.0/myplugin.properties
pool/xrootdPlugins=edu.uchicago.monitor
detailed=xrootd.t2.ucsd.edu:9930:60
summary=xrootd.t2.ucsd.edu:9931:60
Services
Listening
More... Close
[root@t3fs08 ~]# lsof -Pnl +M -i4
lsof: WARNING: vxfsu_get_ioffsets() returned an error.
lsof: WARNING: Thus, no vx_inode information is available
lsof: WARNING: for display or selection of VxFS files.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rpcbind 433 1 3u IPv4 0xffffffffd4e4cc00 0t0 UDP *:111[rpcbind]
rpcbind 433 1 4u IPv4 0xffffffffd4e4ca00 0t0 UDP *:*
rpcbind 433 1 5u IPv4 0xffffffffe2012e00 0t0 UDP *:32775
rpcbind 433 1 6u IPv4 0xffffffffc19f4740 0t0 TCP *:111[rpcbind] (LISTEN)
rpcbind 433 1 7u IPv4 0xffffffffc19f4040 0t0 TCP *:* (IDLE)
inetd 447 0 17u IPv4 0xffffffffc19f6a40 0t0 TCP *:6481 (LISTEN)
syslogd 460 0 5u IPv4 0xffffffffe150e600 0t0 UDP *:32776
syslogd 460 0 6u IPv4 0xffffffffe2012c00 0t0 UDP *:32777
syslogd 460 0 7u IPv4 0xffffffffe2012a00 0t0 UDP *:32778
snmpd 489 0 15u IPv4 0xffffffffd4e4ce00 0t0 UDP *:161
snmpd 489 0 16u IPv4 0xffffffffe150ec00 0t0 UDP *:32779
snmpd 489 0 17u IPv4 0xffffffffe2012600 0t0 UDP *:*
xntpd 527 0 19u IPv4 0xffffffffe2012800 0t0 UDP *:123
xntpd 527 0 20u IPv4 0xffffffffe2012000 0t0 UDP 127.0.0.1:123
xntpd 527 0 21u IPv4 0xffffffffe2ae4e00 0t0 UDP 192.33.123.48:123
nrpe_1k 8866 101 5u IPv4 0xffffffffc19f4e40 0t0 TCP *:5666 (LISTEN)
sshd 19962 0 9u IPv4 0xffffffffe58c21c0 0t0 TCP 127.0.0.1:6010 (LISTEN)
gmond 21320 60001 4u IPv4 0xfffffe98de85b000 0t5633496 UDP 192.33.123.48:53838
java 21784 513 5u IPv4 0xfffffe98de3c8a80 0t0 TCP 192.33.123.48:52542->192.33.123.26:9867 (ESTABLISHED)
java 21784 513 6u IPv4 0xffffffffc1d1fc00 0t0 UDP *:53845
java 21784 513 8u IPv4 0xffffffffe58c3080 0xa23d111d TCP 192.33.123.48:52566->192.33.123.24:11111 (ESTABLISHED)
java 21784 513 9u IPv4 0xffffffffe58c4580 0t0 TCP *:33118 (LISTEN)
java 21784 513 119u IPv4 0xffffffffe2ae4000 0t0 UDP *:53850
java 21784 513 192u IPv4 0xffffffffc1d1f200 0t0 UDP *:53851
java 21784 513 193u IPv4 0xfffffe98fc2b2100 0t0 TCP *:33130 (LISTEN)
java 21784 513 194u IPv4 0xffffffffc4051ac0 0t0 TCP *:33120 (LISTEN)
java 21784 513 559u IPv4 0xfffffe98eeaa2200 0t0 UDP *:53852
java 21784 513 632u IPv4 0xfffffe98c4ed6e00 0t0 UDP *:53853
java 21830 513 5u IPv4 0xffffffffe58b1380 0t0 TCP 192.33.123.48:52543->192.33.123.26:9867 (ESTABLISHED)
java 21830 513 6u IPv4 0xffffffffe150ee00 0t0 UDP *:53846
java 21830 513 8u IPv4 0xfffffe99138f8540 0t0 TCP *:22125 (LISTEN)
java 21830 513 10u IPv4 0xfffffe98fc2b5900 0xe71da7e TCP 192.33.123.48:52568->192.33.123.24:11111 (ESTABLISHED)
java 21876 513 5u IPv4 0xfffffe98e3ad73c0 0t0 TCP 192.33.123.48:52544->192.33.123.26:9867 (ESTABLISHED)
java 21876 513 6u IPv4 0xfffffe98eeaa2600 0t0 UDP *:53847
java 21876 513 8u IPv4 0xffffffffe5913800 0x3c6a491a TCP 192.33.123.48:52565->192.33.123.24:11111 (ESTABLISHED)
java 21876 513 9u IPv4 0xfffffe990d337000 0t0 TCP *:2811 (LISTEN)
java 21876 513 11u IPv4 0xfffffe98fc2b75c0 0t13901 TCP 192.33.123.48:2811->192.33.123.112:47490 (ESTABLISHED)
java 21876 513 14u IPv4 0xfffffe98e676ac00 0t0 TCP 192.33.123.48:24997 (LISTEN)
java 21924 513 5u IPv4 0xfffffe98d7566080 0t0 TCP 192.33.123.48:52545->192.33.123.26:9867 (ESTABLISHED)
java 21924 513 6u IPv4 0xfffffe98de85bc00 0t0 UDP *:53848
java 21924 513 8u IPv4 0xfffffe98ce96c740 0xe8dba4a TCP 192.33.123.48:52567->192.33.123.24:11111 (ESTABLISHED)
java 21924 513 9u IPv4 0xffffffffea0f77c0 0t0 TCP *:22128 (LISTEN)
dCache services
[root@t3fs08 ~]# dcache services
DOMAIN SERVICE CELL LOG
t3fs08-Domain-pool pool t3fs08_cms /var/log/dcache/t3fs08-Domain-pool.log
t3fs08-Domain-pool pool t3fs08_ops /var/log/dcache/t3fs08-Domain-pool.log
t3fs08-Domain-dcap dcap DCap-t3fs08 /var/log/dcache/t3fs08-Domain-dcap.log
t3fs08-Domain-gridftp gridftp GFTP-t3fs08 /var/log/dcache/t3fs08-Domain-gridftp.log
t3fs08-Domain-gsidcap gsidcap DCap-gsi-t3fs08 /var/log/dcache/t3fs08-Domain-gsidcap.log
Nagios
To restart the local NRPE daemon in front of a configuration change apply the following
kill -9
:
root@t3fs02 $ ps -ef | grep nrpe
nagios 6793 1 0 Nov 20 ? 0:14 /opt/csw/bin/nrpe -c /opt/csw/etc/nrpe.cfg -d
root@t3fs02 $ kill -9 6793
root@t3fs02 $ ps -ef | grep nrpe
nagios 15477 1 0 10:41:35 ? 0:00 /opt/csw/bin/nrpe -c /opt/csw/etc/nrpe.cfg -d
Backups
Just ZFS snapshots and only for the OS.