Solaris Installations

We use Solaris 10 as our default. We had tested OpenSolaris installations in 2008 (OpenSolaris 2008.05, but decided to stay with the standard systems)

System installation

We use a jumpstart server to install raw Solaris10 from the network (Look here for instructions).

Some basic OS configuration tasks like ZFS setup and basic puppet configuration need to be done manually as described in the next chapter. But most other configurations tasks are then handled by the puppet system.

Oracle SunOS 5.11 changes

Networking

dladm show-dev now is dladm show-link

Solaris 10

Look at the preinstalled system's version

cat /etc/release
                        Solaris 10 11/06 s10x_u3wos_10 X86
           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                           Assembled 14 November 2006

use the showrev command

showrev

Hostname: t3fs04
Hostid: 6c2b90b
Release: 5.10
Kernel architecture: i86pc
Application architecture: i386
Hardware provider:
Domain:
Kernel version: SunOS 5.10 Generic_127112-10

Create a root home directory

Solaris10 regrettably uses "/" as root's homedir.

Edit the root entry in /etc/passwd

root:x:0:0:Super-User:/root:/sbin/sh

Create the directory and move some files to it

mkdir /root
chmod 700 /root
chown root:root /root 

cd /
mv .Xauthority .bash_history .ssh  /root/

getting ssh access as root and passwordless root access from the admin node

Default installations usually do not allow remote SSH access for root. You need to

  • edit /etc/ssh/sshd_config to allow root access (PermitRootLogin yes option)
  • restart the ssh service
    svcadm restart svc:/network/ssh:default

From the admin node, copy the SSH key to the authorized keys

ssh-copy-id -i /root/.ssh/id_dsa root@NEWMACHINE

Log out and in again to the new server to have the correct environment for the next steps

getting remote access to the console via the ILOM

On some x4500 systems this worked out of the box, on others there were problems. Frequently, the console works up to the point when the grub boot prompt appears. Remote access to the console through the ILOM (start /SP/console) is very desirable, because one can get important messages like the ones one gets when patches are applied after a system reboot.

How to get the console to work

  • comment out the splashimage line from the /boot/grub/menu.lst file
  • Edit the /boot/solaris/bootenv.rc file. This file contains the boot environment settings. Some of the values also can be seen by using the eeprom command as root. On all Solaris machines where the redirection works, I can see in this file a setting
    setprop console 'ttya'
    A machine with failing redirection had this property set to 'text'.

Note: The issue is much more complicated than this, and various hints to the problem can be gained from a ELOM on X4150 discussion on this OpenSolaris weblink. and this nice wiki from uwaterloo.

Solaris 10 Services

Solaris 10 Services to disable

for n in "
telnet
ftp
rlogin
svc:/network/shell:kshell
svc:/network/shell:default
finger
sendmail
smtp
"; do svcadm disable $n ; done

Disable the font server through inetd:

/usr/sbin/inetadm -d svc:/application/x11/xfs:default

Check the open ports with netstat! There should be nothing left except for SSH:

netstat -a

TCP: IPv6
   Local Address                     Remote Address                 Swind Send-Q Rwind Recv-Q   State      If
--------------------------------- --------------------------------- ----- ------ ----- ------ ----------- -----
      *.*                               *.*                             0      0 49152      0 IDLE
      *.ssh                             *.*                             0      0 49152      0 LISTEN

SCTP:
        Local Address                   Remote Address          Swind  Send-Q Rwind  Recv-Q StrsI/O  State
------------------------------- ------------------------------- ------ ------ ------ ------ ------- -----------
0.0.0.0                         0.0.0.0                              0      0 102400      0  32/32  CLOSED

Active UNIX domain sockets
Address  Type          Vnode     Conn  Local Addr      Remote Addr
ffffffffb0ad23a8 stream-ord 00000000     00000000
ffffffffb0c76c88 stream-ord 00000000     00000000
ffffffffaf41bc80 stream-ord ffffffffae9a89c0 00000000    /var/run/.inetd.uds

Mounting our minimal shared SW area (mounting a Linux NFS share on Solaris)

mkdir /mnt/master
mount -o vers=3 t3admin01:/master /mnt/master
Note: The -o vers=3 is needed for Solaris to understand the older protocol used on Linux.

geting solaris packages

Basic Solaris Package management

This is a list of examples for the basic Solaris package management commands, i.e. using the native Solaris commands

pkgadd -d ./SUNWhd-1.01.pkg     # install a package given by a file

pkginfo    # lists all installed packages
pkginfo -l CSWpuppet  # lists detailed package information for this package

pkgchk -l -p /usr/lib/values-Xa.o   # list packages to which this local file belongs and check against package
pkgchk -l CSWpuppet | grep Pathname   # lists files belonging to the named package

pkgrm CSWpuppet    # remove a package. Dependencies will be checked

from OpenCSW

THIS SECTION NEEDS TO BE REVISED. But opencsw is still very much alive - Derek 2013-04-30

This follows instructions from http://www.opencsw.org/

  1. Get the pkg-get utility and transfer it to the new node
    wget http://www.opencsw.org/pkg_get.pkg
    wget ftp://ftp.ibiblio.org/pub/mirrors/opencsw/wget-i386.bin
    
  2. Install the package
    pkgadd -d pkg_get.pkg
    
  3. Edit the pkg-get configuration /opt/csw/etc/pkg-get.conf to select our nearby mirror
       url=http://mirror.switch.ch/ftp/mirror/opencsw/current
       
  4. if pkg-get is not able to download a wget itself, copy a static one from ftp://ftp.ibiblio.org/pub/mirrors/opencsw/wget-i386.bin to the node
    scp wget-i386.bin newnode:/opt/csw/bin/wget
    
  5. Install the gnupg suite. This will take some time
    /opt/csw/bin/pkg-get install gnupg textutils
    
  6. Import the opencsw key and set the trust on it
    /opt/csw/bin/gpg --keyserver pgp.mit.edu --recv-keys E12E9D2F
    /opt/csw/bin/gpg --edit-key E12E9D2F
      trust   # in the command line interface of gpg
                # choose "trust ultimately"
    

from here on you can go to the puppet installation.

from Blastwave

Blastwave has gone... as a result of Oracle's policiesDenis Clarke threw the towel:

Note:

I follow here the instructions from the Blastwave main page:

pkgadd -d http://blastwave.network.com/csw/pkgutil_`/sbin/uname -p`.pkg 
/opt/csw/bin/pkgutil --catalog
/opt/csw/bin/pkgutil --install gnupg textutils

Import the blastwave gpg key to check the packages that you download!

/opt/csw/bin/gpg --keyserver pgp.mit.edu --recv-keys A1999E90
/opt/csw/bin/gpg --list-keys

    //.gnupg/pubring.gpg
    --------------------
    pub   1024D/A1999E90 2008-08-17 [expires: 2011-08-17]
    uid                  Blastwave Software (Blastwave.org Inc.) 
    sub   2048g/E4845389 2008-08-17 [expires: 2011-08-17]

/opt/csw/bin/gpg --edit-key A1999E90

...
Command> trust
...
Your decision? 5   (trust ultimately)
Do you really want to set this key to ultimate trust? (y/N) y
...

Edit the pkgutil configuration file /etc/opt/csw/pkgutil.conf. We are using the default mirror with the unstable release (access to newer packages).

...
# To enable use of gpg or md5, uncomment these
# NOTE: it doesn't make sense to use md5 but not gpg so your options should be:
#       1. both disabled, 2. gpg enabled, 3. both enabled
# Default: false, false
use_gpg=true
use_md5=true
...

Again fetch the catalog, and now it will be checked with gpg

/opt/csw/bin/pkgutil --catalog

Now, we need to enter the paths to these tools into the configuration

  1. add the path /opt/csw/bin in /etc/default/login
    PATH=/opt/csw/bin:/usr/bin:
    SUPATH=/sbin:/usr/sbin:/opt/csw/bin:/usr/bin
    
  2. update the MANPATH in /etc/profile by adding these lines
    MANPATH=/usr/share/man:/opt/csw/share/man
    export MANPATH
    

Log out and in again to test whether you now have the pkgutil utility in the path and can read its man page.

How to install packages with pkgutil

Searching for a package using a substring (the -a flag shows available packages)
pkgutil -a sed
   gsed                 CSWgsed              4.2.1,REV=2009.07.01       432.6 KB

As a minimum we need to fetch gsed and rsync

pkgutil -i gsed rsync

Useful are also

  • CSWshutils - Gnu coreutils. They will be installed with a "g" prefix to all commands
  • top

Install Puppet

Note: It is necessary to pick a Puppet client that works with our puppet server version. Changes are sometimes erratic even with minor version upgrades and will break the current functionality and manifests. As of this writing I recommend to retrieve puppet 24.8 from the /master/Solaris area of t3admin01.

If you want to install facter and puppet from our local area, you first need to use pkg-get to satisfy the dependencies:

/opt/csw/bin/pkg-get -i ruby

# now install our standard puppet version
pkgadd -d /mnt/master/Solaris/facter-1.5.7\,REV\=2009.11.16-SunOS5.9-all-CSW.pkg
pkgadd -d /mnt/master/Solaris/puppet-0.24.8,REV=2009.03.23-SunOS5.10-all-CSW.pkg

Immediately disable the puppet service that is started up automatically by the package

svcadm disable puppetd

Put this into the /etc/puppet/puppet.conf file

[main]
   environment = DerekDevelopment

[puppetd]
   factsync = true
   #pluginsync = true
   server = psi-puppet1.psi.ch
   #evaltrace = true

Put a correct config in a /etc/sysconfig/psi file

ZONE=Tier3
SET=Solaris
ROLE=dcachefs

Be sure that you have entered the new node into the main puppet node manifest for the Tier-3.

Run puppet (from version 0.25 puppetd resides in sbin)

/opt/csw/sbin/puppetd -v -t

Patching

  • For finding information on patches: http://sunsolve.sun.com/patchfinder/
  • do this only from the system console or you may not see important messages, especially if a system restart is necessary
  • some patches are dangerous (see below) and can render the system unbootable (seldom, but it happened!)

System Registration

Copy the registration template /usr/lib/breg/data/RegistrationProfile.properties and edit it (you need to get yourself a SUN account at .http://sunsolve.sun.com/).

scp -p /root/clusteradmin/profiles/solarisfs/common/root/RegistrationProfile.properties NEWNODE:/root/

By convention, I install one in the /root/ folder. So, you may use my existing one on the cluster. Register the machine

cd /root
sconadm register -a -r ./RegistrationProfile.properties

sconadm is running
Authenticating user ...
finish registration!

this can fail with a long java backtrace if there is a problem with the hostname or domainname. The java framework somehow wants to connect back to a local socket, and this fails . More information may be gathered from a log file written at /tmp/basicreg${DATE}*.log

The patching policy can be seen and changed like this:

smpatch get
...
patchpro.install.types          -               rebootafter:reconfigafter:standard
...

smpatch set patchpro.install.types=rebootafter:reconfigafter:standard

patch information gathering

You can list the revision information of already applied patches using the showrev command:

showrev -p |head -3

Patch: 118344-14 Obsoletes: 122397-01 Requires:  Incompatibles:  Packages: SUNWcsu, SUNWcsl, SUNWckr, SUNWhea, SUNWarc, SUNWfmd
Patch: 118368-04 Obsoletes:  Requires:  Incompatibles:  Packages: SUNWcsu

This command analyzes which patches are available for this machine

smpatch analyze

...
138110-01 SunOS 5.10_x86: ata driver patch
...

One can also analyze which dependencies need to be installed for a specific patch (or multiple patches. Multiple -i options are supported or one can pass a file with a list).

smpatch analyze -i 138110-01

        You have new messages. To retrieve: smpatch messages [-a]

127128-11 SunOS 5.10_x86: kernel patch
138110-01 SunOS 5.10_x86: ata driver patch

# with a listfile
smpatch analyze -x idlist=patch-list-20081215.lst

125556-01 SunOS 5.10_x86: patch behavior patch
138270-02 SunOS 5.10_x86: devfs patch
127128-11 SunOS 5.10_x86: kernel patch
137140-06 SunOS 5.10_x86: aggr patch
119255-59 SunOS 5.10_x86: Install and Patch Utilities Patch
137122-03 SunOS 5.10_x86: e1000g driver patch
138110-01 SunOS 5.10_x86: ata driver patch

You can see the README of a specific patch by executing

smpatch download -t -i 139580-01

patch installation

  • patches are downloaded to /var/sadm/spool by default.
  • logs from install attempts and READMEs are available in /var/sadm/patch/PATCHNUMBER/ directories

smpatch update # without additional argument applies all available patches

smpatch update -x idlist=patch-list-20081215.lst # updates specified by list
smpatch update -i patch_id1   -i patch_id1   # updates by single patch IDs

It may be that the patches require a system restart or at least going to maintenance mode to be installed. It is easiest to log in on the remote console and initiate a init 0, so one still can follow the progress, e.g.

init 0
root@t3fs03 # svc.startd: The system is coming down.  Please wait.
svc.startd: 94 system services are now being stopped.
Dec 15 11:35:06 t3fs03 syslogd: going down on signal 15
Dec 15 11:35:07 rpc.metad: Terminated
Installing updates
Installing update 138270-02 Succeeded
Installing update 127128-11 Succeeded
........

Note: Careful with kernel patches. The patch 137138-09 from SUN rendered two systems unbootable (q.v. this Issue report).

Solaris Filesystem configuration (UFS, ZFS), disk partitioning, RAID

The hd and hdadm tool

The hd tool is very useful to get an overview on the device mappings to physical disks: One can get it from the X4500 Solaris tools CD and install it by
pkgadd -d ./SUNWhd-1.01.pkg

hd usage:

hd   #  show mappings and ASCII-graphical slot display
hd -q  # sequence in drive slot order. 1 & 2 are bootable disks
hd -l # list in phys order (1 spc separated line). 1 & 2 are bootable disks
hd -j # shows controller device files (PCI) and mapping to c1, c2, c3 ....
hd -w /pci@0,0/pci1022,7458@1/pci11ab,11ab@1/disk@2,0  # maps pci dev path to disk
hd -r and hd -R: read out SMART information

In a very intersting blog entry Ben Rockwood points to the possibility of reading out the SMART data of all disks with hd -r (detailed) or hd -R (terse). In his experience the (1.) Raw read error rate and the (5.) Reallocated sector count are the best indicators for disks to be replaced. He also points to an interesting paper on disk failure statistics by google.

The hdadm tool can be used to manage disks and bring them online/offline.

hdadm display # displays info on all drives
hdadm offline slot 11
hdadm offline disk c0t0
hdadm offline row3
hdadm offline col3
hdadm online all

Disk exchange

The cfgadmin command can be used to manage the hardware configuration (e.g. connected state for devices.). This may be necessary when a disk is exchanged and ends up in disconnected state. It may require something like this (from http://osdir.com/mlos.solaris.opensolaris.storage.general/2008-07/msg00122.html):
 # cfgadm -c configure sata1/3

A lot of examples can be found on our page of FileserverDiskProblems.

ZFS pool setup

Delete the original empty zpool: zpool destroy zpool1

Run our "setup_zpool.sh" script which intelligently identifies the boot disks and puts out the commands to create our required zpool structure. Example:

setup_zpool.sh

# system contains 48 disks
# boot disks: c5t0 c5t4
# spare disks: c0t3 c0t7
# num of pool disks:  44
zpool create -f data1 raidz1 c4t0d0 c4t4d0 c7t0d0 c7t4d0 c6t0d0 c6t4d0 c1t0d0 c1t4d0 c0t0d0
zpool add -f data1 raidz c0t4d0 c5t1d0 c5t5d0 c4t1d0 c4t5d0 c7t1d0 c7t5d0 c6t1d0 c6t5d0
zpool add -f data1 raidz c1t1d0 c1t5d0 c0t1d0 c0t5d0 c5t2d0 c5t6d0 c4t2d0 c4t6d0 c7t2d0
zpool add -f data1 raidz c7t6d0 c6t2d0 c6t6d0 c1t2d0 c1t6d0 c0t2d0 c0t6d0 c5t3d0 c5t7d0
zpool add -f data1 raidz c4t3d0 c4t7d0 c7t3d0 c7t7d0 c6t3d0 c6t7d0 c1t3d0 c1t7d0
zpool add -f data1 spare c0t3d0 c0t7d0

monitoring disk I/O

You can use the following commands to monitor i/O broken down to single disks

zpool iostat -v

iostat -nxz 5

soon obsolete: Custom UFS partitioning and RAID setup for X4500

How to partition a X4500 and set up RAID mirroring can be found on the InstallationSolarisPartitioning page.

soon obsolete: UFS Boot partition information of preinstalled Solaris10 X4500

The boot image in the preinstalled X4500 Solaris 10 installations of this version is not on a ZFS file system (and actually I think this is also would not give a big added value). This is the list of mounted partitions as seen in /etc/vfstab (and /etc/mnttab), among them the root partition, which sits on an ufs file system:

Contents of /etc/vfstab on a SUN Solaris 10 preinstalled X4500:

#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass    at boot options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/md/dsk/d10 /dev/md/rdsk/d10        /       ufs     1       no      -
/devices        -       /devices        devfs   -       no      -
ctfs    -       /system/contract        ctfs    -       no      -
objfs   -       /system/object  objfs   -       no      -
/dev/md/dsk/d20 -       -       swap    -       no      -
/dev/md/dsk/d30 /dev/md/rdsk/d30        /var    ufs     1       no      -

Here is a good document on the so-called soft-partitions (kind of logical volumes): http://sysunconfig.net/unixtips/soft-partitions.html

To find out more about these so called meta devices, one can use the metastat command:

metastat

d10: Mirror
    Submirror 0: d11
      State: Okay
    Submirror 1: d12
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 22539195 blocks (10 GB)

d11: Submirror of d10
    State: Okay
    Size: 22539195 blocks (10 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c5t0d0s0          0     No            Okay   Yes


d12: Submirror of d10
    State: Okay
    Size: 22539195 blocks (10 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c5t4d0s0          0     No            Okay   Yes


Device Relocation Information:
Device   Reloc  Device ID
c5t0d0   Yes    id1,sd@SATA_____HITACHI_HUA7250S______GTF402P6G3NM6F
c5t4d0   Yes    id1,sd@SATA_____HITACHI_HUA7250S______GTF402P6G3KPYF

The information in the last few lines points to the physical disks on which the file systems reside. This can be correlated with the information given by the hd utility, and we can see that (naturally) these two devices are the boot disks (same is found for the d30 meta device, here).

Changing the hostname

If the hostname has been configured wrongly, you may need to make changes to all of these files. This is regrettably much harder than in Linux.

  • /etc/hosts
  • /etc/nodename
  • /etc/hostname.*
  • /etc/inet/ipnodes

Networking

Network trunking

Note: Trunking requires the switches to know about it, or only the outgoing transfer will profit from it (because there the OS will decide which interface to use). If the switch is set to do static trunking, one can get into problems when the system is in the boot phase, because at that point it is not yet able to support the trunking. So, it can happen that the system cannot receive answers from DHCP, because the switch will send them in on another interface than the request was coming from. This problem can be solved by using the LACP (Line Aggregation Control Protocol) technology, where the client will tell the server about when to trunk the interfaces. The switch must be configured for this!

This service should be online: svc:/network/physical:default

detach and erase the configuration for the single interface

dladm show-dev
ifconfig nge0 unplumb

rm -f /etc/hostname.nge0
rm -f /etc/dhcp.nge0

Create the aggregated network interface: There is a bug in the dladm on our nodes, even after many system updates having been installed:

dladm create-aggr -P L2,L3,L4 --lacp-mode=active -d e1000g0 -d e1000g1 -d e1000g2 -d e1000g3 1
Segmentation Fault (core dumped)

The trunking works if the hash function policy only uses L3,L4 or L2,L4. The --lacp-mode=active flag turns on LACP mode (see above) for the trunking.

dladm create-aggr -P L3,L4 --lacp-mode=active -d e1000g0 -d e1000g1 -d e1000g2 -d e1000g3 1

NOTE: One can also change the properties of an existing aggregated link

dladm modify-aggr --lacp-mode=active 1

One can list the details on the trunked interfaces

dladm show-aggr -L

key: 1 (0x0001) policy: L3,L4   address: 0:14:4f:a6:df:e4 (auto)
                LACP mode: active       LACP timer: short
    device    activity timeout aggregatable sync  coll dist defaulted expired
    e1000g0   active   short   yes          yes   yes  yes  no        no
    e1000g1   active   short   yes          yes   yes  yes  no        no
    e1000g2   active   short   yes          yes   yes  yes  no        no
    e1000g3   active   short   yes          yes   yes  yes  no        no

To select a hardcoded IP/hostname configuration, make sure that /etc/inet/hosts (DANGER: /etc/hosts is just a symbolic link to that!) contains the local node's IP and hostname

echo "[my hostname" > /etc/hostname.aggr1
echo 192.33.123.1 > /etc/defaultrouter
Also put the correct information into /etc/netmasks.
192.33.123.0    255.255.255.0

Activate the interface and restart the service to obtain the correct network settings.

ifconfig aggr1 plumb
svcadm restart svc:/network/physical:default

Our aggr.sh script will take care of correctly trunking on both Solaris 10 and OpenSolaris systems.

TCP tuning

Explanation of the Solaris TCP parameters (older, from 2002): http://developers.sun.com/solaris/articles/tuning_for_streaming.html also look at the reference manual: http://docs.sun.com/app/docs/doc/806-4015/6jd4gh8fn?a=view

From the TCP Tuning Guide:

For Solaris create a boot script similar to this (e.g.: /etc/rc2.d/S99ndd) 
#!/bin/sh
# increase max tcp window

# The maximum buffer size in bytes. It controls how large the send and receive buffers
# can be set to by an application using setsockopt(3SOCKET).
ndd -set /dev/tcp tcp_max_buf 4194304

# The maximum value of TCP congestion window (cwnd) in bytes.
ndd -set /dev/tcp tcp_cwnd_max 2097152

# The default (???) send/receive window size in bytes (i.e. what you get if the application does not ask for more)
ndd -set /dev/tcp tcp_xmit_hiwat 65536
ndd -set /dev/tcp tcp_recv_hiwat 65536

Note: The 64kB for the hiwat values are too small for WAN transfers over gigabit links.

Here are the default values on one of our Solaris10 Thumpers:

root@t3fs01 $ ndd -get /dev/tcp tcp_max_buf
1048576
root@t3fs01 $ ndd -get /dev/tcp tcp_cwnd_max
1048576
root@t3fs01 $ ndd -get /dev/tcp tcp_xmit_hiwat
49152
root@t3fs01 $ ndd -get /dev/tcp tcp_recv_hiwat
49152
This effectively limits the tcp window to 48 kbyte!!!!

How to fix the recurring t3fs06 links down issue

Sometimes during an unexpected NFS load vs NFS t3fs06 its links goes down and that locks all the activities at T3; if that happens Nagios will report it and the status will be in t3fs06:
dladm show-dev
e1000g0         link: unknown   speed: 1000  Mbps       duplex: full
e1000g1         link: unknown   speed: 1000  Mbps       duplex: full
e1000g2         link: unknown   speed: 1000  Mbps       duplex: full
e1000g3         link: unknown   speed: 1000  Mbps       duplex: full
To recover try:
# ifconfig aggr1 unplumb
# ifconfig aggr1 plumb
# svcadm restart svc:/network/physical:default
Check that now the link is like:
bash-4.2# dladm show-aggr
key: 1 (0x0001) policy: L3,L4   address: 0:14:4f:a6:e1:8c (auto)
           device       address                 speed           duplex  link    state
           e1000g0      0:14:4f:a6:e1:8c          1000  Mbps    full    up      attached
           e1000g1      0:14:4f:a6:e1:8d          1000  Mbps    full    up      attached
           e1000g2      0:14:4f:a6:e1:8e          1000  Mbps    full    up      attached
           e1000g3      0:14:4f:a6:e1:8f          1000  Mbps    full    up      attached

Syslog

Solaris 10 machines come with a syslog setup where the /etc/syslog.conf file is processed by m4, before it is fed to the syslogd.

If a host loghost is defined inside of /etc/hosts, and this host resolves to one of the IP addresses of the local machine, the LOGHOST variable will get defined, and the various conditionals will resolve to logging to local files.

Cleanly shutting down

Shutdown:

  • shutdown -i5 -g0 This turns the machine into state 5, which means a state in which the machine can be turned off, and it turns the machine off.

Rebooting:

  • init 6: Use of `init 6` will give the cleanest and orderly reboot (init informs svc.startd of the runlevel change and will move to the appropriate milestone)
  • shutdown -y -g0 -i6 "message" will invoke init as well as give you grace period and messages to user

NOTE: halt,reboot,poweroff will not run any of the shutdown scripts and should be last resort.

Local Firewall settings

Look at the Tier-2 documentation. At PSI we have a firewall in front of the Tier-3.

Performance monitoring

Memory monitoring

Good article on oralife blog

mdb

Nive overview of memory usage

root@t3fs05 $ echo ::memstat | mdb -k
   Page Summary                Pages                MB  %Tot
   ------------     ----------------  ----------------  ----
   Kernel                     738360              2884   18%
   ZFS File Data               11856                46    0%
   Anon                        19434                75    0%
   Exec and libs                2583                10    0%
   Page cache                    206                 0    0%
   Free (cachelist)             6218                24    0%
   Free (freelist)           3282408             12821   81%

   Total                     4061065             15863
   Physical                  3975282             15528

  • Kernel: Kernel page
  • Anon: anonymous pages (such as stack, heap, shared mem etc)
  • Exec and libs: executables and libraries
  • Page cache: file cache
  • Free (cachelist) + Free (freelist) = freemem(value for column “free” when “vmstat” is issued)

prstat

like top

nice overview of per user statistics can be obtained by: prstat -p

ps

Sorts processes according to mem usage

ps -efo pmem,uid,pid,ppid,pcpu,comm | sort -r

pmap

Like in Linux

/usr/proc/bin/pmap -x  

Dtrace

Useful dtracetoolkit commands:

  • iotop - display top disk I/O events by process (list for every disk)
  • rwsnoop: measuring reads and writes at the application level
  • dtruss -p pid: truss equivalent
  • bitesize.d: histogram of sizes of disk events per process as sent by the block I/O driver
  • seeksize.d: histograms disk event seeks by process
  • iopattern: differentiates sequential from random i/o
  • iosnoop: lists processes doing IO (limited utility on ZFS because no filenames are given)
  • opensnoop: snoops file open calls

SAR

You must install the SUNWaccu and SUNWaccr (service) packages.

Then, activate the service

svcadm enable svc:/system/sar:default

The data taking is done per cron job as the sys user: Uncomment these lines if necessary

cat /var/spool/cron/crontabs/sys
#ident  "@(#)sys        1.5     92/07/14 SMI"   /* SVr4.0 1.2   */
#
# The sys crontab should be used to do performance collection. See cron
# and performance manual pages for details on startup.
#
0 * * * 0-6 /usr/lib/sa/sa1
20,40 8-17 * * 1-5 /usr/lib/sa/sa1
5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A

Misc

  • prstat
  • intrstat

Links with documentation and Howtos


OpenSolaris 2008.05 OBSOLETE

Basic OS installation

OpenSolaris was installed by attaching a USB DVD drive with the OpenSolaris distribution to the machine and booting from it.

SUN ILOM (works on X4500): cd SP/console ; start or start SP/console

On x4500:

cat /etc/release
                        OpenSolaris 2008.05 snv_86_rc3 X86
           Copyright 2008 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 26 April 2008

Form where did I have this? SunOS Release 5.10 Version Generic_127112-10 64-bit Marvell Serial ATA Adapter - Driver Version 3.6.1.0-5

On SunOS unknown 5.10 Generic_127112-10 i86pc i386 i86pc

Booting into single user mode

In the grub screen, edit the kernel options and add a "-s" flag. Then boot with "b".

getting passwordless root access from the admin node

  • first edit /etc/ssh/sshd_config to allow root access (PermitRootLogin yes option) and restart the ssh service.
  • opensolaris may require turning off the pam_roles module, because in /etc/user_attr root is defined as a role. This is different from what we find on a standard Solaris 10 installation.

Definitions for the /etc/pam.conf file:

# PSI modification to allow interactive root user remote login
sshd-pubkey     auth requisite          pam_authtok_get.so.1
sshd-pubkey     auth required           pam_dhkeys.so.1
sshd-pubkey     auth required           pam_unix_cred.so.1
sshd-pubkey     auth required           pam_unix_auth.so.1
# sshd-pubkey   account requisite       pam_roles.so.1 allow_remote
#        allow_remote is not enough because auser information is missing
sshd-pubkey     account required        pam_unix_account.so.1
Note: one may also want to define this for the sshd-kbdint module (keybord password based login).

One could also drop the definition of root as a role.

Package management

installing a standard SUN package from file using the default package manager

pkgadd -d ./SUNWhd-1.01.pkg
pkginfo    # lists all installed packages
pkginfo -l CSWpuppet  # lists detailed package information for this package
pkgchk -l -p /usr/lib/values-Xa.o   # list packages to which this local file belongs and check against package
pkgchk -l CSWpuppet | grep Pathname   # lists files belonging to the named package

Obtaining free additional packages from Blastwave

NOTE: BLASTWAVE SEEMS NOT VERY ACTIVE ANY MORE: MOST DEVELOPERS HAVE GONE TO http://www.opencsw.org/!!!!!

  1. Install the pkg-get tool from blastwave (see blastwave web page)
    pkgadd -d http://www.blastwave.org/pkg_get.pkg
    You may have to download the file by hand according to instructions at the blastwave page, if the weblink fails.
  2. You may need to modify the repository url in /opt/csw/etc/pkg-get.conf to point to a valid mirror. We use
    url=http://blastwave.network.com/csw/stable
  3. add the path /opt/csw/bin in /etc/default/login
    PATH=/opt/csw/bin:/usr/bin:
    SUPATH=/sbin:/usr/sbin:/opt/csw/bin:/usr/bin
    
  4. update the MANPATH in /etc/profile
    MANPATH=/usr/share/man:/opt/csw/share/man
    export MANPATH
    

pkg-get example usage:

pkg-get -D regexp   # searches for a package
pkg-get install top   # installs package "top"
pkginfo CSWtop     # shows info about a package

special pkg messages are saved in logfiles in /var/sadm/install/logs/

Packages to install

pkg-get -i gsed

the new package manager 'pkg' (not used by us)

A new package manager pkg is in development and it seems that it used for some packages. It is orthogonal to the other package management and this is a little disconcerting. We do not use it.

Still, here is some example usage:

# pfexec is like sudo and often used for issueing the following comands as another user
pkg refresh
pkg list [fmri-pattern]   # list packages  (very slow)
pkg info [fmri-pattern]  # list package information (long format)
pkg authority  # displays repository info
pkg search gnu # search for packages with this substring
pkg contents SUNWgnu-coreutils
pkg install SUNWvim
packages get downloaded from opensolaris servers via http (Look at http://pkg.opensolaris.org/status).

ZFS file system and disk management

Very nice links

How to determine what ZFS features are available on a system:

zfs upgrade -v

The following filesystem versions are supported:

VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS filesystem version
 2   Enhanced directory entries
 3   Case insensitive and File system unique identifer (FUID)
 4   userquota, groupquota properties

For more information on a particular version, including supported releases, see:

http://www.opensolaris.org/os/community/zfs/version/zpl/N

Where 'N' is the version number.

On OpenSolaris the root partition already sits on a zpool:

mount
  / on rpool/ROOT/opensolaris read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90002

zfs list
rpool                               2.92G   450G    55K  /rpool
rpool@install                         16K      -    55K  -
rpool/ROOT                          2.92G   450G    18K  /rpool/ROOT
rpool/ROOT@install                    15K      -    18K  -
rpool/ROOT/opensolaris              2.92G   450G  2.33G  legacy
rpool/ROOT/opensolaris@install       196M      -  2.22G  -
rpool/ROOT/opensolaris/opt           412M   450G   411M  /opt
rpool/ROOT/opensolaris/opt@install   118K      -  3.60M  -
rpool/export                         540K   450G    19K  /export
rpool/export@install                  15K      -    19K  -
rpool/export/home                    506K   450G   488K  /export/home
rpool/export/home@install             18K      -    21K  -

Locating the boot disk

cfgadm|grep sata3/0    # x4500: for determining boot disk

All device files are located under /dev/dsk/

The hd tool is very useful to get an overview on the device mappings to physical disks: One can get it from the Solaris tools CD by using

pkgadd -d ./SUNWhd-1.01.pkg

hd usage:

hd   #  show mappings and ASCII-graphical slot display
hd -q  # sequence in drive slot order. 1 & 2 are bootable disks
hd -l # list in phys order (1 spc separated line). 1 & 2 are bootable disks
hd -j # shows controller device files (PCI) and mapping to c1, c2, c3 ....
hd -w /pci@0,0/pci1022,7458@1/pci11ab,11ab@1/disk@2,0  # maps pci dev path to disk

ZFS command examples

zfs list   :  shows mountpoints
zpool status -v  [poolname] # status + device. This also will show corrupted files, e.g. after a resilver
zpool status -x  # health of all pools

zpool online tank c1t0d0 # bring a pool device online

zpool iostat
zpool iostat rpool 2
zpool iostat -v  # device specific information
zpool import # lists exported pools ready to be imported
zpool import [-f] zpool1
zpool  upgrade zpool1
zpool destroy zpool1

zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0
zpool create tank raidz c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0
zpool create pool mirror c0d0 c1d0 spare c2d0 c3d0  # spare

Networking

The solarisfs profile which can be synced from the admin node contains a script for setting up an aggregated network device. Since it will detach the current network connections, it should be run either in batch mode (e.g. using the at service with at now) or inside a console session.

bash aggr.sh -f 

for dhcp to work correctly, one needs two files

  • /etc/hostname.[interface] makes the interface persistant and may contain the hostname
  • /etc/dhcp.[interface] is needed to get the dhcp parameters to be used for the local configuration

A little command overview:

ifconfig -a
dladm show-dev e1000g0
dladm show-link [-s] [linkname]
# disable nwam and enable default
svcs disable svc:/network/physical:nwam
svcs enable svc:/network/physical:default
# must have a /etc/hostname.aggr1 to make
# aggregate persistent
ifconfig e1000g1 unplumb
dladm create-aggr -l e1000g0 ... aggr1
dladm show-aggr [-L] [-x] [-s]
dladm delete-aggr aggr1
ifconfig aggr1 plumb
# aggr seems to get MAC of first link
ifconfig aggr1 dhcp start
netstat -D  # shows state of dhcp per interface
# dhcp config in /etc/default/dhcpagent
dhcpinfo -i e1000g0 12
dhcpinfo -i e1000g0 Hostname

How to control *inetd* related services:
inetadm          # lists all services and their status
inetadm -l svc:/network/rpc/smserver:default    # shows config details for that service

Network routing table

CAREFUL: The file /etc/defaultrouter can be used to hardcode a default route (just put the IP of the target machine in). If this file contains a setting, it will be added to the settings that are obtained by DHCP, and you may end up with 2 default routes. This can lead to unpredictable results, if the hardcoded IP points to a wrong place.

Some commands to change routes:

route add 192.33.123.0/24  192.33.123.44 -interface      # adding a route with gw being
                                                         # machine's own interface
route add default  192.33.123.1

route delete default 192.33.123.41  # deleting a gateway default route

DNS

Do not forget to modify the /etc/nsswitch.conf file to use DNS information. Here you can define whether only file based information (/etc/hosts) or also the resolver is used.

How to enable jumbo frames

Refer to the Solaris driver's e1000g man page.

This is not yet tested!

You can look at the present frame size with
ndd -get /dev/e1000g0 max_frame_size
    1514

ifconfig  e1000g0  mtu  16128   # enable jumbo frames

It seems that one can also add the following line to the /etc/hostname.aggr1 file
mtu 16128

It might be that one has to change by hand the MaxFrameSize line in the driver's configuration file /kernel/drv/e1000g.conf, which by default contains the following.

MaxFrameSize=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
        # 0 is for normal ethernet frames.
        # 1 is for upto 4k size frames.
        # 2 is for upto 8k size frames.
        # 3 is for upto 16k size frames.
        # These are maximum frame limits, not the actual ethernet frame
        # size. Your actual ethernet frame size would be determined by
        # protocol stack configuration (please refer to ndd command man pages)
        # For Jumbo Frame Support (9k ethernet packet)
        # use 3 (upto 16k size frames)

Note: - On a freshly installed Solaris 10 10/08 s10x_u6wos_07b X86 system the ndd command did not accept the max_frame_size parameter. Need to check, why this is missing.

ndd -get /dev/e1000g0 max_frame_size
operation failed: Invalid argument

TODO for DerekFeichtinger (priority 5)
Look at jumbo frame support for file servers

Services

Service management is done with the scvadmin, svcs and svcprop commands

You can get a list of service udentifiers + description by doing

svcs -o FMRI,DESC

Example Usage:

svcs     # lists all services
svcs -xv          # diagnose service startup problems
svcs  -l  gdm   # lists details of sevice specified by pattern or _FMRI_ ID
svcs -d gdm    # lists all other services on which this service depends
svcs -D fc-cache   # lists all services depending on the given service
svcs -p gdm    # list all pids associated with the named service

svcadm enable gdm    # enable a service
svcadm enable -t gdm    # temporarily enable a service

svcadm disable gdm   # disable a service
svcadm disable -t gdm   # temporarily disable a service

svcadm restart gdm
svcadm clear gdm   # if service is in maintenance state, signal that we have repaired and can start
 
svcprop gdm   # lists service configuration properties for services framework

Services to disable

Disable these services

for n in "
finger
sendmail
ppd-cache-update
ogl-select gdm
"; do svcadm disable $n ; done

Patches and updates

!Opensolaris has no real patch service. There is a GUI application that can be used to get an overview over packages that can be updated: packagemanager. One may also have a look at the image-update [-nvq] command.

Compiling with gcc

Note: Careful! The many updates needed to get gcc working on this installation rendered the system unbootable. This derived from an update to ZFS, where the bootloader could no longer read the old zfs root partition.

Make sure that package CSWgcc3 is installed

Set up the environment

export PATH=$PATH:/opt/csw/gcc3/bin
-bash-3.2# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/csw/gcc3/lib

I needed to install the SUNWarc package to get gcc working for OpenSolaris. This requires using the new package manager. It is not very nice that we need to rely on two different package management systems. (tip from http://wiki.csclub.uwaterloo.ca/OpenSolaris)

pkg install SUNWarc SUNWsfwhea SUNWhea SUNWtoo

This took quite some time, since a number of packages needed to be fetched from the opensolaris web site. I wonder whether we can mirror such packages locally.

The error had been:

gcc test.c
ld: fatal: file values-Xa.o: open failed: No such file or directory
ld: fatal: File processing errors. No output written to a.out
collect2: ld returned 1 exit status

OpenSolaris boot problem after update

After some update (I do not know whether it was an automatic update or one made by me some days ago), the system was no longer able to boot. The root file system had been installed on a ZFS fs by the OS installation process, and there was no obvious place where one could have selected the boot partition FS as an option. One of the updates seemingly introduced an incompatibility of the bootloader and that version of ZFS. This is the related OpenSolaris Bug 6534519. I was not able to save the system like described on that page (but I did not manage to completely follow the instructions).

I reinstalled OpenSolaris. A further bad point is that I cannot get the ILOM serial console redirection to work with opensolaris. It works for the BIOS startup, but fails before grub comes up. After the reinstallation, I did a reboot, and the machine (using the graphical ILOM redirection) always came back with a black screen and a "0037" BIOS message in the lower right corner. Found a Forum thread on a similar problem. And indeed, powering the machine off for some time and rebooting brought back the OpenSolaris login prompt.

-- DerekFeichtinger - 05 Aug 2008

Edit | Attach | Watch | Print version | History: r90 < r89 < r88 < r87 < r86 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r89 - 2013-09-13 - DerekFeichtinger
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback