Node Type: dCacheSolaris

Firewall requirements

local port open to reason
2811/tcp * gridftp control connection
22125/tcp 192.33.123.0/24 unauthenticated dcap (read only)
22128/tcp 192.33.123.0/24 gsidcap (GSI authenticated dcap)
20000-25000/tcp * Globus port range for gridftp/xrootd data streams


Regular Maintenance work

Emergency Measures

Broken 16GB Compact Flash Card NEW

  • Fabio left an already installed and fully tested t3fs10 Compact Flash Card placed inside the X4540 installed above t3fs11 ; simply take it and use it to recover the failed X4540;
  • you'll have to delete the related Puppet keys from psi-puppet3.psi.ch and to run Puppet on the restored X4540 to get installed the correct X509 cert and key, or simply copy them from t3admin01:/root/clusteradmin/etc/hostkeys/switch-QuoVadis
  • dCache won't start automatically, you have to start it by dcache start

Broken 16GB Compact Flash Card - We have to reinstall Solaris 10

Example of 16GB Flash Card
  • If you're in emergency then stop Nagios to avoid to get too many fake e-mails ssh root@t3nagios /etc/init.d/nagios stop.
  • If the 16GB Compact Flash Card is broken:
    • in the 3 spares X4540 servers mounted in our last rack there are 3 16GB Compact Flash Card; also in the AIT warehouse close to the Derek's office there are 2 16GB Compact Flash Card placed inside the spares X4540 there stored.
    • once you will have inserted the new 16GB Compact Flash Card you have to use the Solaris VM t3jumpstart to reinstall from scratch Solaris 10 1/13 and configure it automatically in the shape described by the following chapters. Remember that you can use the VM t3fs15 to quickly test the Solaris 10 1/13 installation executed by t3jumpstart, by doing that you can both validate the installation procedure and avoid to write too much onto the target 16GB Compact Flash Card that's actually supposed to be seldomly written!! Furthermore, it could be that by inserting a new 16GB Compact Flash Card into the X4540 the boot disks are going to be reshaffled in the BIOS!. That will prevent to complete the Solaris 10 1/13 installation, Fabio got this case. If that happens rebot and enter into the server BIOS and reorder the boot disks, with the Compact Flash Card placed as the 1st boot device.
  • If doable then zfs export data1 from the failed Solaris installation before to zfs import data1 into the new Solaris installation; anyhow you can always force the zfs import
  • Always inform the users by sending an e-mail to cms-tier3-users@lists.psi.ch; if you want to produce the list of files affected you can use the v_pnfs views made by Fabio.
  • If you accidentaly altered/erased a Solaris file located in / then maybe you could recover it either by a puppetd -t -v = run or by searching for it among the =zfs list snapshots.

Broken HW ( e.g. a 1TB disk )

[root@t3fs08 ~]# fmadm faulty will tell you; Nagios has a check based on that.

Installation - Example For t3fs08

Solaris 10 1/13 installation

Installation is described in Puppet by tier3-baseclasses.pp plus Sol10_fs26.pp; Fabio uses these alias and Puppet recipes are in puppetdirnodes; Solaris files are in puppetdirsolaris
alias dcache='ssh -2 -l admin -p 22224 t3dcachedb.psi.ch'
alias kscustom57='cd /afs/psi.ch/software/linux/dist/scientific/57/custom'
alias kscustom64='cd /afs/psi.ch/software/linux/dist/scientific/64/custom'
alias ksdir='cd /afs/psi.ch/software/linux/kickstart/configs'
alias puppetdir='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/'
alias puppetdirnodes='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/manifests/nodes'
alias puppetdirredhat='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/RedHat'
alias puppetdirsolaris='cd /afs/psi.ch/service/linux/puppet/var/puppet/environments/DerekDevelopment/modules/Tier3/files/Solaris/5.10'
alias yumdir5='cd /afs/psi.ch/software/linux/dist/scientific/57/scripts'
alias yumdir6='cd /afs/psi.ch/software/linux/dist/scientific/6/scripts'

Remember to erase the existing Puppet keys associated to the X4540 that you're reinstalling from scratch! e.g.:

$ ssh  -XY martinelli_f@psi-puppet3.psi.ch
[martinelli_f@psi-puppet3 ~]$ sudo /usr/sbin/puppetca --clean  t3fs08.psi.ch
t3fs08.psi.ch
notice: Removing file Puppet::SSL::Certificate t3fs08.psi.ch at '/var/puppet/ssl/ca/signed/t3fs08.psi.ch.pem'

If everything work as designed you just have to follow the instructions on NodeTypeJumpStart to reinstall a X4540.

Once Solaris 10 1/13 will be installed onto the new 16GB Compact Flash Card tune ZFS by:


zfs set atime=off rpool           
zfs set sync=always rpool

SSHd keys

Puppet will upload the previous server SSH keys, this will avoid the ssh clients complaints.

SSHd and TcpWrapper

We prevent SSH logins from unauthorized hosts.
cat /etc/hosts.allow
# Puppet Managed File
sshd: t3admin01.psi.ch fabiom-mac.psi.ch wmgt01.psi.ch wmgt02.psi.ch dflt1w.psi.ch localhost t3ossec.psi.ch t3nagios.psi.ch t3fs01.psi.ch t3fs02.psi.ch t3fs03.psi.ch t3fs04.psi.ch t3fs07.psi.ch t3fs08.psi.ch t3fs09.psi.ch t3fs10.psi.ch t3fs11.psi.ch 

ZFS setup for data1 partition

Warning ! Warning !
CREATE data1 ONLY IF data1 DOESN'T EXIST !
IN REAL LIFE data1 WILL ALREADY EXIST SO RUN zpool import data1 INSTEAD AND NEITHER CREATE data1 NOR ALTER ITS ZFS PROPERTIES !!

zpool create -f data1 raidz2  c1t0d0 c1t5d0 c2t2d0 c2t7d0 c3t4d0 c4t1d0 c4t6d0 c5t3d0 c6t0d0
zpool add -f data1 raidz2 c1t1d0 c1t6d0 c2t3d0 c3t0d0 c3t5d0 c4t2d0 c4t7d0 c5t4d0 c6t1d0
zpool add -f data1 raidz2 c1t2d0 c1t7d0 c2t4d0 c3t1d0 c3t6d0 c4t3d0 c5t0d0 c5t5d0 c6t2d0
zpool add -f data1 raidz2 c1t3d0 c2t0d0 c2t5d0 c3t2d0 c3t7d0 c4t4d0 c5t1d0 c5t6d0 c6t3d0
zpool add -f data1 raidz2 c1t4d0 c2t1d0 c2t6d0 c3t3d0 c4t0d0 c4t5d0 c5t2d0 c5t7d0 c6t4d0
zpool add -f data1 spare c6t7d0 c6t6d0 c6t5d0
# ZFS tuning 
zfs create data1/t3fs08_cms
zfs create data1/t3fs08_ops
zfs set quota=30TB data1/t3fs08_cms
zfs set quota=1GB data1/t3fs08_ops
zfs set recordsize=1024K data1
zfs set devices=off data1
zfs set atime=off data1
zfs set exec=on data1 # to avoid an additional stress to the weak Compact Flash Cards then you'll want to relocate /opt/csw into /data1 and put exec=on ; otherwise it's safer exec=off because /data1 shouldn't contains executables  

GRID PKI infrastructure

A Puppet run will upload:
  • /etc/grid-security/hostcert.pem
  • /etc/grid-security/hostkey.pem
  • and the tool /opt/fetch-crl/fetch-crl needed to daily updated the lcg-CA CRLs.

just the first time, to upload the lcg-CA files into /etc/grid-security/certificates please connect to t3admin01 and run /root/clusteradmin/sync_cacerts_tofs.singlet3fs.sh t3fs08

The CA CRL files transfered from t3admin01 will be updated because on t3admin01 there is a cron that regularly refresh them and a Nagios check that check this 'freshness' ; following this first manual upload there is a root crontab created by Puppet that will invoke daily /opt/fetch-crl/fetch-crl on t3fs08; that's showed below in this page.

NTP time server

Make sure that the time service is running correctly; t3nagios will constantly check that; the automatic Solaris 10 1/13 installation made by t3jumpstart will take care of ntp:
-bash-3.2# svcs ntp
STATE          STIME    FMRI
online         Oct_02   svc:/network/ntp:default

The configuration file for xntpd is found at /etc/inet/ntp.conf:
More... Close

-bash-3.2# cat /etc/inet/ntp.conf
# NOTE: This file is managed through puppet
# If you edit this file locally, it will be replaced in
# the next puppet run
#
# File is located at
# $Id: NodeTypeFileServer.txt,v 1.49 2015/06/01 09:56:11 fabiom Exp $
# $URL: svn+ssh://savannah01.psi.ch/repos/tier3/tier3/puppet/TRUNK/modules/Tier3/files/Solaris/5.10/etc/inet/ntp.conf $
# as produced by a fresh Solaris 10 jumpstart install
#server 192.33.126.10 prefer
driftfile /var/ntp/ntp.drift
statsdir /var/ntp/ntpstats
filegen peerstats file peerstats type day enable
filegen loopstats file loopstats type day enable
filegen clockstats file clockstats type day enable
server   dmztime1.psi.ch
restrict dmztime1.psi.ch noquery nomodify
server   dmztime2.psi.ch
restrict dmztime2.psi.ch noquery nomodify

Java JDK

JDK7 is a requirement for dCache 2.6; the automatic Solaris 10 1/13 installation performed by t3jumpstart will take care of JDK7:
-bash-3.2# which java
/usr/bin/java
-bash-3.2# ls -l /usr/bin/java
lrwxrwxrwx   1 root     other         16 Oct  2 15:57 /usr/bin/java -> ../java/bin/java
-bash-3.2# ls -l /usr/java    
lrwxrwxrwx   1 root     root          10 Oct  2 16:37 /usr/java -> jdk/latest
-bash-3.2# ls -l /usr/java/jdk
/usr/java/jdk: No such file or directory
-bash-3.2# ls -l /usr/jdk/    
total 11
drwxr-xr-x   5 root     bin            5 Oct  2 16:33 instances
lrwxrwxrwx   1 root     other          7 Oct  2 15:58 j2sdk1.4.2_34 -> ../j2se
lrwxrwxrwx   1 root     other         18 Oct  2 15:58 jdk1.5.0_32 -> instances/jdk1.5.0
lrwxrwxrwx   1 root     other         18 Oct  2 15:57 jdk1.6.0_37 -> instances/jdk1.6.0
lrwxrwxrwx   1 root     other         18 Oct  2 16:35 jdk1.7.0_40 -> instances/jdk1.7.0
lrwxrwxrwx   1 root     other         11 Oct  2 16:35 latest -> jdk1.7.0_40
drwxr-xr-x   4 root     bin            4 Oct  2 15:58 packages

Configure sensible system limits ??

Not sure if this is still needed nowadays: it is on =t3fs13,14=.

On default Solaris10 installations the soft limit for the maximal number of open file descriptors by a process is only 256! This must be raised for dcache. One convenient way to do it is to put this into /opt/d-cache/jobs/dcache.local.sh, a file which is sourced in the dcache start process. This is set by puppet for our servers.

ulimit -Sn 32768

pkgutil

The Solaris 10 1/13 installation made by t3jumpstart will automatically take care of the pkgutil + useful pkgs. http://www.opencsw.org/package/pkgutil/ is a must on Solaris to use your daily Linux tools also in Solaris.

If you need to install the pkgs by hand then this is the list:

/opt/csw/bin/pkgutil -i -y CSWnagiosp 
/opt/csw/bin/pkgutil -i -y CSWnrpe
/opt/csw/bin/pkgutil -i -y CSWruby
/opt/csw/bin/pkgutil -i -y CSWsmartmontools
/opt/csw/bin/pkgutil -i -y CSWwatch
/opt/csw/bin/pkgutil -i -y CSWpstree
/opt/csw/bin/pkgutil -i -y CSWtop 
/opt/csw/bin/pkgutil -i -y CSWiftop
/opt/csw/bin/pkgutil -i -y CSWnfswatch
/opt/csw/bin/pkgutil -i -y CSWnano
/opt/csw/bin/pkgutil -i -y CSWalternatives
/opt/csw/bin/pkgutil -i -y CSWaudiofile
/opt/csw/bin/pkgutil -i -y CSWaugeas
/opt/csw/bin/pkgutil -i -y CSWbash
/opt/csw/bin/pkgutil -i -y CSWbdb47
/opt/csw/bin/pkgutil -i -y CSWbdb48
/opt/csw/bin/pkgutil -i -y CSWbonobo2
/opt/csw/bin/pkgutil -i -y CSWbzip2
/opt/csw/bin/pkgutil -i -y CSWcacertificates
/opt/csw/bin/pkgutil -i -y CSWcas-cpsampleconf
/opt/csw/bin/pkgutil -i -y CSWcas-cptemplates
/opt/csw/bin/pkgutil -i -y CSWcas-crontab
/opt/csw/bin/pkgutil -i -y CSWcas-etcservices
/opt/csw/bin/pkgutil -i -y CSWcas-etcshells
/opt/csw/bin/pkgutil -i -y CSWcas-inetd
/opt/csw/bin/pkgutil -i -y CSWcas-initsmf
/opt/csw/bin/pkgutil -i -y CSWcas-migrateconf
/opt/csw/bin/pkgutil -i -y CSWcas-postmsg
/opt/csw/bin/pkgutil -i -y CSWcas-preserveconf
/opt/csw/bin/pkgutil -i -y CSWcas-pycompile
/opt/csw/bin/pkgutil -i -y CSWcas-texinfo
/opt/csw/bin/pkgutil -i -y CSWcas-usergroup
/opt/csw/bin/pkgutil -i -y CSWcommon
/opt/csw/bin/pkgutil -i -y CSWcoreutils
/opt/csw/bin/pkgutil -i -y CSWcswclassutils
/opt/csw/bin/pkgutil -i -y CSWdbusglib
/opt/csw/bin/pkgutil -i -y CSWelinks
/opt/csw/bin/pkgutil -i -y CSWemacs
/opt/csw/bin/pkgutil -i -y CSWemacsbincommon
/opt/csw/bin/pkgutil -i -y CSWemacschooser
/opt/csw/bin/pkgutil -i -y CSWemacscommon
/opt/csw/bin/pkgutil -i -y CSWesound
/opt/csw/bin/pkgutil -i -y CSWexpat
/opt/csw/bin/pkgutil -i -y CSWfconfig
/opt/csw/bin/pkgutil -i -y CSWfindutils
/opt/csw/bin/pkgutil -i -y CSWfontconfig
/opt/csw/bin/pkgutil -i -y CSWfreeglut
/opt/csw/bin/pkgutil -i -y CSWftype2
/opt/csw/bin/pkgutil -i -y CSWgawk
/opt/csw/bin/pkgutil -i -y CSWgcc3corert
/opt/csw/bin/pkgutil -i -y CSWgconf2
/opt/csw/bin/pkgutil -i -y CSWgcpio
/opt/csw/bin/pkgutil -i -y CSWgcrypt
/opt/csw/bin/pkgutil -i -y CSWgdbm
/opt/csw/bin/pkgutil -i -y CSWgdkpixbuf
/opt/csw/bin/pkgutil -i -y CSWggettext
/opt/csw/bin/pkgutil -i -y CSWggettext-data
/opt/csw/bin/pkgutil -i -y CSWggettextrt
/opt/csw/bin/pkgutil -i -y CSWggrep
/opt/csw/bin/pkgutil -i -y CSWgio-fam-backend
/opt/csw/bin/pkgutil -i -y CSWgit
/opt/csw/bin/pkgutil -i -y CSWgit-emacs
/opt/csw/bin/pkgutil -i -y CSWgit-gui
/opt/csw/bin/pkgutil -i -y CSWglib2
/opt/csw/bin/pkgutil -i -y CSWgnomekeyring
/opt/csw/bin/pkgutil -i -y CSWgnomevfs2
/opt/csw/bin/pkgutil -i -y CSWgnupg
/opt/csw/bin/pkgutil -i -y CSWgpg-error
/opt/csw/bin/pkgutil -i -y CSWgpgerr
/opt/csw/bin/pkgutil -i -y CSWgsed
/opt/csw/bin/pkgutil -i -y CSWgtar
/opt/csw/bin/pkgutil -i -y CSWgtk2
/opt/csw/bin/pkgutil -i -y CSWgtk2-printbackends-file
/opt/csw/bin/pkgutil -i -y CSWgtk2-printbackends-papi
/opt/csw/bin/pkgutil -i -y CSWgvim
/opt/csw/bin/pkgutil -i -y vim 
/opt/csw/bin/pkgutil -i -y CSWgzip
/opt/csw/bin/pkgutil -i -y CSWhicoloricontheme
/opt/csw/bin/pkgutil -i -y CSWiconv
/opt/csw/bin/pkgutil -i -y CSWiftop
/opt/csw/bin/pkgutil -i -y CSWiozone
/opt/csw/bin/pkgutil -i -y CSWipython
/opt/csw/bin/pkgutil -i -y CSWisaexec
/opt/csw/bin/pkgutil -i -y CSWjbigkit
/opt/csw/bin/pkgutil -i -y CSWjpeg
/opt/csw/bin/pkgutil -i -y CSWkrb5lib
/opt/csw/bin/pkgutil -i -y CSWlsof
# Perl
/opt/csw/bin/pkgutil -i -y CSWpm-compress-raw-bzip2
/opt/csw/bin/pkgutil -i -y CSWpm-compress-raw-zlib
/opt/csw/bin/pkgutil -i -y CSWpm-html-parser
/opt/csw/bin/pkgutil -i -y CSWpm-html-tagset
/opt/csw/bin/pkgutil -i -y CSWpm-io-compress
/opt/csw/bin/pkgutil -i -y CSWpm-libwww-perl
/opt/csw/bin/pkgutil -i -y CSWpm-mime-base64
/opt/csw/bin/pkgutil -i -y CSWpm-uri
/opt/csw/bin/pkgutil -i -y CSWpmbutils
/opt/csw/bin/pkgutil -i -y CSWpmdatemanip
/opt/csw/bin/pkgutil -i -y CSWpmfontafm
/opt/csw/bin/pkgutil -i -y CSWpmhtmlfmt
/opt/csw/bin/pkgutil -i -y CSWpmhtmlformat
/opt/csw/bin/pkgutil -i -y CSWpmhtmlparser
/opt/csw/bin/pkgutil -i -y CSWpmhtmltagset
/opt/csw/bin/pkgutil -i -y CSWpmhtmltree
/opt/csw/bin/pkgutil -i -y CSWpmiocompress
/opt/csw/bin/pkgutil -i -y CSWpmmimebase64
/opt/csw/bin/pkgutil -i -y CSWpmuri

Cron Jobs

We regularly update the /etc/grid-security/certificates folder by using the tool /opt/fetch-crl/fetch-crl: More... Close
crontab -l 
# HEADER: This file was autogenerated at Wed Oct 02 16:38:05 +0200 2013 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
#ident  "@(#)root       1.21    04/03/23 SMI"
#
# The root crontab should be used to perform accounting data collection.
#
#
10 3 * * * /usr/sbin/logadm
15 3 * * 0 /usr/lib/fs/nfs/nfsfind
30 3 * * * [ -x /usr/lib/gss/gsscred_clean ] && /usr/lib/gss/gsscred_clean
#
# The rtc command is run to adjust the real time clock if and when 
# daylight savings time changes.
#
1 2 * * * [ -x /usr/sbin/rtc ] && /usr/sbin/rtc -c > /dev/null 2>&1
43 3 * * * [ -x /opt/csw/bin/gupdatedb ] && /opt/csw/bin/gupdatedb --prunepaths="/dev /devices /proc /tmp /var/tmp" 1>/dev/null 2>&1 # Added by CSWfindutils
# Puppet Name: fetch-crl
10 22 * * * /opt/fetch-crl/fetch-crl -c /opt/fetch-crl/fetch-crl.cnf -v  2>&1 | /usr/bin/tee /var/cron/fetch-crl.log 2>&1

dCache 2.6 - CSCS page

LCGTier2/ServiceDcache

dCache 2.6

A MUST NOTE: dCache runs as the user dcache not anymore as the user root so you might be hit by a permission denied.
dCache package: More... Close
-bash-3.2# pkginfo -l dCache
   PKGINST:  dCache
      NAME:  dCache Server
  CATEGORY:  application
      ARCH:  all
   VERSION:  2.6.16-1
   BASEDIR:  /
    VENDOR:  ${vendor}
    PSTAMP:  ${vendor}
  INSTDATE:  Nov 21 2013 10:08
     EMAIL:  support@dcache.org
    STATUS:  completely installed
     FILES:      560 installed pathnames
                  10 shared pathnames
                  72 directories
                   5 executables
              170729 blocks used (approx)

Pools

A Puppet run will upload setup files, will create dirs and groups and will assign the right dirs modes.

It's easy to make a pool, you just have to:

  • define the pool in /etc/dcache/layouts/t3fs08.conf
  • assign each dir/files to the dcache user
  • install a setup file like this
  • you simply run /usr/bin/dcache start and all the needed dirs and files will be created into that pool dir

Important files in a nutshell

find /etc/dcache/

/etc/dcache/dcache.conf  <-- main dCache conf, it should be the same on each node, if not it could be because the node needs more RAM for dCache.

/etc/dcache/logback.xml   <-- to tune the logging verbosity, on SUN X4540 it should always be to the level 'error' to avoid to write too much on the Unigen 16GB Flash Card

/etc/dcache/layouts
/etc/dcache/layouts/t3fs08.conf   <-- specific node conf.

# dCache optional plugins
 /usr/local/share/dcache/plugins

# dCache Logs
/var/log/dcache/
/var/log/dcache/t3fs08-Domain-gsidcap.log
/var/log/dcache/t3fs08-Domain-pool.log
/var/log/dcache/t3fs08-Domain-dcap.log
/var/log/dcache/t3fs08-Domain-gridftp.log

# dCache GSI layer root@t3fs01 $ ls -l /etc/grid-security/ total 267 drwxr-xr-x 2 root root 1202 Nov 26 13:58 certificates -rw-r--r-- 1 dcache root 1880 Apr 5 2013 hostcert.pem -rw-r--r-- 1 root root 1872 May 26 2011 hostcert.pem-20110526-1358 -rw-r--r-- 1 root root 1896 May 26 2011 hostcert.pem-20120504-1040 -rw-r--r-- 1 dcache root 1880 May 3 2012 hostcert.pem-20130405 -r-------- 1 dcache root 1679 Jul 21 2009 hostkey.pem drwxr-x--- 2 root nagios 3 Nov 20 13:53 nagios <----- hostcert.pem -> /etc/grid-security/hostcert.pem

# Nagios checks root@t3fs01 $ find /opt/csw/etc/nrpe.cfg.d/ /opt/csw/etc/nrpe.cfg.d/ /opt/csw/etc/nrpe.cfg.d/check_file_age_cern_crl.cfg /opt/csw/etc/nrpe.cfg.d/check_X509.cfg

/etc/dcache/dcache.conf

The same as NodeTypeStorageElement#etc_dcache_dcache_conf

/etc/dcache/layouts/t3fs08.conf

More... Close
-bash-3.2# cat /etc/dcache/layouts/t3fs08.conf
# Puppet Managed File 

[${host.name}-Domain-pool]

[${host.name}-Domain-pool/pool]
name=t3fs08_cms
path=/data1/t3fs08_cms/pool
waitForFiles=${path}/data

#[${host.name}-Domain-pool/pool]
#name=t3fs08_cms_1
#path=/data1/t3fs08_cms_1/pool

[${host.name}-Domain-pool/pool]
name=t3fs08_ops
path=/data1/t3fs08_ops/pool
waitForFiles=${path}/data

[${host.name}-Domain-dcap]
[${host.name}-Domain-dcap/dcap]

[${host.name}-Domain-gridftp]
[${host.name}-Domain-gridftp/gridftp]

[${host.name}-Domain-gsidcap]
[${host.name}-Domain-gsidcap/gsidcap]

/data1/t3fs08_cms/pool/setup

More... Close
-bash-3.2# ls -l /data1/t3fs08_cms/pool/setup 
-rw-r-----   1 dcache   cms         1201 Oct  8 15:00 /data1/t3fs08_cms/pool/setup
-bash-3.2# cat /data1/t3fs08_cms/pool/setup
#
# Created by t3fs13_cms_2(Pool) at Wed Mar 28 14:28:45 CEST 2012
#
csm set checksumtype ADLER32
csm set policy -frequently=off
csm set policy -onread=off -onwrite=on -onrestore=off -ontransfer=off -enforcecrc=on -getcrcfromhsm=off
#
# Flushing Thread setup
#
flush set max active 1000
flush set interval 60
flush set retry delay 60
#
# HsmStorageHandler2(org.dcache.pool.classic.HsmStorageHandler2)
#
rh set max active 0
st set max active 0
rm set max active 1
rh set timeout 14400
st set timeout 14400
rm set timeout 14400
jtm set timeout -queue=p2p -lastAccess=0 -total=0
jtm set timeout -queue=default -lastAccess=0 -total=0
jtm set timeout -queue=wan -lastAccess=0 -total=0
jtm set timeout -queue=io -lastAccess=0 -total=0
set heartbeat 30
set report remove on
set breakeven 0.7
set gap 4g
set duplicate request none
set p2p separated
#
# Flushing Thread setup
#
flush set max active 1000
flush set interval 60
flush set retry delay 60
mover set max active -queue=default 100
mover set max active -queue=wan 2
p2p set max active 10
#
# MigrationModule
#
#
#  Pool to Pool (P2P) [$Revision: 1.49 $]
#
pp set port 0
pp set max active 10
pp set pnfs timeout 300
set max diskspace 32212254635000

/data1/t3fs08_ops/pool/setup

More... Close
#
# Created by t3fs13_cms_2(Pool) at Wed Mar 28 14:28:45 CEST 2012
#
csm set checksumtype ADLER32
csm set policy -frequently=off
csm set policy -onread=off -onwrite=on -onrestore=off -ontransfer=off -enforcecrc=on -getcrcfromhsm=off
#
# Flushing Thread setup
#
flush set max active 1000
flush set interval 60
flush set retry delay 60
#
# HsmStorageHandler2(org.dcache.pool.classic.HsmStorageHandler2)
#
rh set max active 0
st set max active 0
rm set max active 1
rh set timeout 14400
st set timeout 14400
rm set timeout 14400
jtm set timeout -queue=p2p -lastAccess=0 -total=0
jtm set timeout -queue=default -lastAccess=0 -total=0
jtm set timeout -queue=wan -lastAccess=0 -total=0
jtm set timeout -queue=io -lastAccess=0 -total=0
set heartbeat 30
set report remove on
set breakeven 0.7
set gap 4g
set duplicate request none
set p2p separated
#
# Flushing Thread setup
#
flush set max active 1000
flush set interval 60
flush set retry delay 60
mover set max active -queue=default 100
mover set max active -queue=wan 2
p2p set max active 10
#
# MigrationModule
#
#
#  Pool to Pool (P2P) [$Revision: 1.49 $]
#
pp set port 0
pp set max active 10
pp set pnfs timeout 300
set max diskspace 900000000

xrootd files opened monitoring

For each file opened by xrootd dCache sends a message to http://xrootd.t2.ucsd.edu
[root@t3fs08 ~]# find /usr/local/share/dcache/plugins
/usr/local/share/dcache/plugins
/usr/local/share/dcache/plugins/monitor-5.0.0
/usr/local/share/dcache/plugins/monitor-5.0.0/logback-core-1.0.9.jar
/usr/local/share/dcache/plugins/monitor-5.0.0/logback-classic-1.0.9.jar
/usr/local/share/dcache/plugins/monitor-5.0.0/myplugin.properties
/usr/local/share/dcache/plugins/monitor-5.0.0/monitor-5.0.0.jar
/usr/local/share/dcache/plugins/monitor-5.0.0/README.md
/usr/local/share/dcache/plugins/monitor-5.0.0/LICENSE.txt
$ cat /usr/local/share/dcache/plugins/monitor-5.0.0/myplugin.properties
pool/xrootdPlugins=edu.uchicago.monitor
detailed=xrootd.t2.ucsd.edu:9930:60
summary=xrootd.t2.ucsd.edu:9931:60

Services

Listening

More... Close
[root@t3fs08 ~]#  lsof -Pnl +M -i4
lsof: WARNING: vxfsu_get_ioffsets() returned an error.
lsof: WARNING: Thus, no vx_inode information is available
lsof: WARNING: for display or selection of VxFS files.
COMMAND   PID     USER   FD   TYPE             DEVICE   SIZE/OFF NODE NAME
rpcbind   433        1    3u  IPv4 0xffffffffd4e4cc00        0t0  UDP *:111[rpcbind]
rpcbind   433        1    4u  IPv4 0xffffffffd4e4ca00        0t0  UDP *:*
rpcbind   433        1    5u  IPv4 0xffffffffe2012e00        0t0  UDP *:32775
rpcbind   433        1    6u  IPv4 0xffffffffc19f4740        0t0  TCP *:111[rpcbind] (LISTEN)
rpcbind   433        1    7u  IPv4 0xffffffffc19f4040        0t0  TCP *:* (IDLE)
inetd     447        0   17u  IPv4 0xffffffffc19f6a40        0t0  TCP *:6481 (LISTEN)
syslogd   460        0    5u  IPv4 0xffffffffe150e600        0t0  UDP *:32776
syslogd   460        0    6u  IPv4 0xffffffffe2012c00        0t0  UDP *:32777
syslogd   460        0    7u  IPv4 0xffffffffe2012a00        0t0  UDP *:32778
snmpd     489        0   15u  IPv4 0xffffffffd4e4ce00        0t0  UDP *:161
snmpd     489        0   16u  IPv4 0xffffffffe150ec00        0t0  UDP *:32779
snmpd     489        0   17u  IPv4 0xffffffffe2012600        0t0  UDP *:*
xntpd     527        0   19u  IPv4 0xffffffffe2012800        0t0  UDP *:123
xntpd     527        0   20u  IPv4 0xffffffffe2012000        0t0  UDP 127.0.0.1:123
xntpd     527        0   21u  IPv4 0xffffffffe2ae4e00        0t0  UDP 192.33.123.48:123
nrpe_1k  8866      101    5u  IPv4 0xffffffffc19f4e40        0t0  TCP *:5666 (LISTEN)
sshd    19962        0    9u  IPv4 0xffffffffe58c21c0        0t0  TCP 127.0.0.1:6010 (LISTEN)
gmond   21320    60001    4u  IPv4 0xfffffe98de85b000  0t5633496  UDP 192.33.123.48:53838
java    21784      513    5u  IPv4 0xfffffe98de3c8a80        0t0  TCP 192.33.123.48:52542->192.33.123.26:9867 (ESTABLISHED)
java    21784      513    6u  IPv4 0xffffffffc1d1fc00        0t0  UDP *:53845
java    21784      513    8u  IPv4 0xffffffffe58c3080 0xa23d111d  TCP 192.33.123.48:52566->192.33.123.24:11111 (ESTABLISHED)
java    21784      513    9u  IPv4 0xffffffffe58c4580        0t0  TCP *:33118 (LISTEN)
java    21784      513  119u  IPv4 0xffffffffe2ae4000        0t0  UDP *:53850
java    21784      513  192u  IPv4 0xffffffffc1d1f200        0t0  UDP *:53851
java    21784      513  193u  IPv4 0xfffffe98fc2b2100        0t0  TCP *:33130 (LISTEN)
java    21784      513  194u  IPv4 0xffffffffc4051ac0        0t0  TCP *:33120 (LISTEN)
java    21784      513  559u  IPv4 0xfffffe98eeaa2200        0t0  UDP *:53852
java    21784      513  632u  IPv4 0xfffffe98c4ed6e00        0t0  UDP *:53853
java    21830      513    5u  IPv4 0xffffffffe58b1380        0t0  TCP 192.33.123.48:52543->192.33.123.26:9867 (ESTABLISHED)
java    21830      513    6u  IPv4 0xffffffffe150ee00        0t0  UDP *:53846
java    21830      513    8u  IPv4 0xfffffe99138f8540        0t0  TCP *:22125 (LISTEN)
java    21830      513   10u  IPv4 0xfffffe98fc2b5900  0xe71da7e  TCP 192.33.123.48:52568->192.33.123.24:11111 (ESTABLISHED)
java    21876      513    5u  IPv4 0xfffffe98e3ad73c0        0t0  TCP 192.33.123.48:52544->192.33.123.26:9867 (ESTABLISHED)
java    21876      513    6u  IPv4 0xfffffe98eeaa2600        0t0  UDP *:53847
java    21876      513    8u  IPv4 0xffffffffe5913800 0x3c6a491a  TCP 192.33.123.48:52565->192.33.123.24:11111 (ESTABLISHED)
java    21876      513    9u  IPv4 0xfffffe990d337000        0t0  TCP *:2811 (LISTEN)
java    21876      513   11u  IPv4 0xfffffe98fc2b75c0    0t13901  TCP 192.33.123.48:2811->192.33.123.112:47490 (ESTABLISHED)
java    21876      513   14u  IPv4 0xfffffe98e676ac00        0t0  TCP 192.33.123.48:24997 (LISTEN)
java    21924      513    5u  IPv4 0xfffffe98d7566080        0t0  TCP 192.33.123.48:52545->192.33.123.26:9867 (ESTABLISHED)
java    21924      513    6u  IPv4 0xfffffe98de85bc00        0t0  UDP *:53848
java    21924      513    8u  IPv4 0xfffffe98ce96c740  0xe8dba4a  TCP 192.33.123.48:52567->192.33.123.24:11111 (ESTABLISHED)
java    21924      513    9u  IPv4 0xffffffffea0f77c0        0t0  TCP *:22128 (LISTEN)

dCache services

[root@t3fs08 ~]# dcache services 
DOMAIN                SERVICE CELL            LOG                                       
t3fs08-Domain-pool    pool    t3fs08_cms      /var/log/dcache/t3fs08-Domain-pool.log    
t3fs08-Domain-pool    pool    t3fs08_ops      /var/log/dcache/t3fs08-Domain-pool.log    
t3fs08-Domain-dcap    dcap    DCap-t3fs08     /var/log/dcache/t3fs08-Domain-dcap.log    
t3fs08-Domain-gridftp gridftp GFTP-t3fs08     /var/log/dcache/t3fs08-Domain-gridftp.log 
t3fs08-Domain-gsidcap gsidcap DCap-gsi-t3fs08 /var/log/dcache/t3fs08-Domain-gsidcap.log 

Nagios

To restart the local NRPE daemon in front of a configuration change apply the following kill -9:
root@t3fs02 $ ps -ef | grep nrpe
  nagios  6793     1   0   Nov 20 ?           0:14 /opt/csw/bin/nrpe -c /opt/csw/etc/nrpe.cfg -d

root@t3fs02 $ kill -9 6793

root@t3fs02 $ ps -ef | grep nrpe
  nagios  15477     1   0 10:41:35 ?           0:00 /opt/csw/bin/nrpe -c /opt/csw/etc/nrpe.cfg -d

Backups

Just ZFS snapshots and only for the OS.
NodeTypeForm
Hostnames [t3fs01 - t3fs04, t3fs07 - t3fs11] READ-ONLY !!
Services dcache pool cells, gridftp, dcap, gsidcap
Hardware SUN X4500 (2*Opt 290, 16GB RAM, 48*500GB SATA) / SUN X4540 (2*Opt 2435, 32GB RAM, 48*1TB SATA + 16GB Flash)
Install Profile dcachefs
Guarantee/maintenance until t3fs01-04: 2011-06-02, t3fs07-12: 2011-12-17 ( X4540 only 2 years)
Edit | Attach | Watch | Print version | History: r55 | r52 < r51 < r50 < r49 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r50 - 2015-06-01 - FabioMartinelli
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback