Solaris Installations
We use
Solaris 10 as our default. We had tested OpenSolaris installations in 2008 (OpenSolaris 2008.05, but decided to stay with the standard systems)
System installation
We use a jumpstart server to install raw Solaris10 from the network (Look
here for instructions).
Some basic OS configuration tasks like ZFS setup and basic puppet configuration need to be done manually as described in the next chapter. But most other configurations tasks are then handled by the
puppet system.
Solaris 10
Look at the preinstalled system's version
cat /etc/release
Solaris 10 11/06 s10x_u3wos_10 X86
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 14 November 2006
use the
showrev
command
showrev
Hostname: t3fs04
Hostid: 6c2b90b
Release: 5.10
Kernel architecture: i86pc
Application architecture: i386
Hardware provider:
Domain:
Kernel version: SunOS 5.10 Generic_127112-10
Create a root home directory
Solaris10 regrettably uses "/" as root's homedir.
Edit the root entry in
/etc/passwd
root:x:0:0:Super-User:/root:/sbin/sh
Create the directory and move some files to it
mkdir /root
chmod 700 /root
chown root:root /root
cd /
mv .Xauthority .bash_history .ssh /root/
getting ssh access as root and passwordless root access from the admin node
Default installations usually do not allow remote SSH access for root. You need to
From the
admin node, copy the SSH key to the authorized keys
ssh-copy-id -i /root/.ssh/id_dsa root@NEWMACHINE
Log out and in again to the new server to have the correct environment for the next steps
getting remote access to the console via the ILOM
On some x4500 systems this worked out of the box, on others there were problems. Frequently, the console works up to the point when the grub boot prompt appears. Remote access to the console through the ILOM (
start /SP/console
) is very desirable, because one can get important messages like the ones one gets when patches are applied after a system reboot.
How to get the console to work
- comment out the splashimage line from the
/boot/grub/menu.lst
file
- Edit the
/boot/solaris/bootenv.rc
file. This file contains the boot environment settings. Some of the values also can be seen by using the eeprom
command as root
. On all Solaris machines where the redirection works, I can see in this file a setting setprop console 'ttya'
A machine with failing redirection had this property set to 'text'
.
Note: The issue is much more complicated than this, and various hints to the problem can be gained from a
ELOM on X4150 discussion on
this OpenSolaris weblink. and this nice
wiki from uwaterloo.
Solaris 10 Services
Solaris 10 Services to disable
for n in "
telnet
ftp
rlogin
svc:/network/shell:kshell
svc:/network/shell:default
finger
sendmail
smtp
"; do svcadm disable $n ; done
Disable the font server through inetd:
/usr/sbin/inetadm -d svc:/application/x11/xfs:default
Check the open ports with netstat! There should be nothing left except for SSH:
netstat -a
TCP: IPv6
Local Address Remote Address Swind Send-Q Rwind Recv-Q State If
--------------------------------- --------------------------------- ----- ------ ----- ------ ----------- -----
*.* *.* 0 0 49152 0 IDLE
*.ssh *.* 0 0 49152 0 LISTEN
SCTP:
Local Address Remote Address Swind Send-Q Rwind Recv-Q StrsI/O State
------------------------------- ------------------------------- ------ ------ ------ ------ ------- -----------
0.0.0.0 0.0.0.0 0 0 102400 0 32/32 CLOSED
Active UNIX domain sockets
Address Type Vnode Conn Local Addr Remote Addr
ffffffffb0ad23a8 stream-ord 00000000 00000000
ffffffffb0c76c88 stream-ord 00000000 00000000
ffffffffaf41bc80 stream-ord ffffffffae9a89c0 00000000 /var/run/.inetd.uds
Mounting our minimal shared SW area (mounting a Linux NFS share on Solaris)
mkdir /mnt/master
mount -o vers=3 t3admin01:/master /mnt/master
Note: The
-o vers=3
is needed for Solaris to understand the older protocol used on Linux.
geting solaris packages
Basic Solaris Package management
This is a list of examples for the basic Solaris package management commands, i.e. using the native Solaris commands
pkgadd -d ./SUNWhd-1.01.pkg # install a package given by a file
pkginfo # lists all installed packages
pkginfo -l CSWpuppet # lists detailed package information for this package
pkgchk -l -p /usr/lib/values-Xa.o # list packages to which this local file belongs and check against package
pkgchk -l CSWpuppet | grep Pathname # lists files belonging to the named package
pkgrm CSWpuppet # remove a package. Dependencies will be checked
This follows instructions from
http://www.opencsw.org/
- Get the pkg-get utility and transfer it to the new node
wget http://www.opencsw.org/pkg_get.pkg
wget ftp://ftp.ibiblio.org/pub/mirrors/opencsw/wget-i386.bin
- Install the package
pkgadd -d pkg_get.pkg
- Edit the pkg-get configuration
/opt/csw/etc/pkg-get.conf
to select our nearby mirror
url=http://mirror.switch.ch/ftp/mirror/opencsw/current
- if pkg-get is not able to download a wget itself, copy a static one from ftp://ftp.ibiblio.org/pub/mirrors/opencsw/wget-i386.bin to the node
scp wget-i386.bin newnode:/opt/csw/bin/wget
- Install the gnupg suite. This will take some time
/opt/csw/bin/pkg-get install gnupg textutils
- Import the opencsw key and set the trust on it
/opt/csw/bin/gpg --keyserver pgp.mit.edu --recv-keys E12E9D2F
/opt/csw/bin/gpg --edit-key E12E9D2F
trust # in the command line interface of gpg
# choose "trust ultimately"
from here on you can go to the puppet installation.
from Blastwave
Note:
I follow here the instructions from the
Blastwave main page:
pkgadd -d http://blastwave.network.com/csw/pkgutil_`/sbin/uname -p`.pkg
/opt/csw/bin/pkgutil --catalog
/opt/csw/bin/pkgutil --install gnupg textutils
Import the blastwave gpg key to check the packages that you download!
/opt/csw/bin/gpg --keyserver pgp.mit.edu --recv-keys A1999E90
/opt/csw/bin/gpg --list-keys
//.gnupg/pubring.gpg
--------------------
pub 1024D/A1999E90 2008-08-17 [expires: 2011-08-17]
uid Blastwave Software (Blastwave.org Inc.)
sub 2048g/E4845389 2008-08-17 [expires: 2011-08-17]
/opt/csw/bin/gpg --edit-key A1999E90
...
Command> trust
...
Your decision? 5 (trust ultimately)
Do you really want to set this key to ultimate trust? (y/N) y
...
Edit the pkgutil configuration file
/etc/opt/csw/pkgutil.conf
. We are using the default mirror with the
unstable release (access to newer packages).
...
# To enable use of gpg or md5, uncomment these
# NOTE: it doesn't make sense to use md5 but not gpg so your options should be:
# 1. both disabled, 2. gpg enabled, 3. both enabled
# Default: false, false
use_gpg=true
use_md5=true
...
Again fetch the catalog, and now it will be checked with gpg
/opt/csw/bin/pkgutil --catalog
Now, we need to enter the paths to these tools into the configuration
- add the path
/opt/csw/bin
in /etc/default/login
PATH=/opt/csw/bin:/usr/bin:
SUPATH=/sbin:/usr/sbin:/opt/csw/bin:/usr/bin
- update the MANPATH in
/etc/profile
by adding these lines
MANPATH=/usr/share/man:/opt/csw/share/man
export MANPATH
Log out and in again to test whether you now have the
pkgutil
utility in the path and can read its man page.
How to install packages with pkgutil
Searching for a package using a substring (the -a flag shows available packages)
pkgutil -a sed
gsed CSWgsed 4.2.1,REV=2009.07.01 432.6 KB
As a minimum we need to fetch gsed and rsync
pkgutil -i gsed rsync
Useful are also
- CSWshutils - Gnu coreutils. They will be installed with a "g" prefix to all commands
- top
Install Puppet
Note: It is necessary to pick a Puppet client that works with our puppet server version. Changes are sometimes erratic even with minor version upgrades and will break the current functionality and manifests. As of this writing I recommend to retrieve puppet 24.8 from the /master/Solaris area of t3admin01.
If you want to install facter and puppet from our local area, you first need to use pkg-get to satisfy the dependencies:
/opt/csw/bin/pkg-get -i ruby
# now install our standard puppet version
pkgadd -d /mnt/master/Solaris/facter-1.5.7\,REV\=2009.11.16-SunOS5.9-all-CSW.pkg
pkgadd -d /mnt/master/Solaris/puppet-0.24.8,REV=2009.03.23-SunOS5.10-all-CSW.pkg
Immediately disable the puppet service that is started up automatically by the package
svcadm disable puppetd
Put this into the
/etc/puppet/puppet.conf
file
[main]
environment = DerekDevelopment
[puppetd]
factsync = true
#pluginsync = true
server = psi-puppet1.psi.ch
#evaltrace = true
Put a correct config in a
/etc/sysconfig/psi
file
ZONE=Tier3
SET=Solaris
ROLE=dcachefs
Be sure that you have entered the new node into the main puppet node manifest for the Tier-3.
Run puppet (from version 0.25 puppetd resides in sbin)
/opt/csw/sbin/puppetd -v -t
Patching
- For finding information on patches: http://sunsolve.sun.com/patchfinder/
- do this only from the system console or you may not see important messages, especially if a system restart is necessary
- some patches are dangerous (see below) and can render the system unbootable (seldom, but it happened!)
System Registration
Copy the registration template
/usr/lib/breg/data/RegistrationProfile.properties
and edit it (you need to get yourself a SUN account at .http://sunsolve.sun.com/).
scp -p /root/clusteradmin/profiles/solarisfs/common/root/RegistrationProfile.properties NEWNODE:/root/
By convention, I install one in the
/root/
folder. So, you may use my existing one on the cluster. Register the machine
cd /root
sconadm register -a -r ./RegistrationProfile.properties
sconadm is running
Authenticating user ...
finish registration!
this can fail with a long java backtrace if there is a problem with the hostname or domainname. The java framework somehow wants to connect back to a local socket, and this fails .
More information may be gathered from a log file written at
/tmp/basicreg${DATE}*.log
The patching policy can be seen and changed like this:
smpatch get
...
patchpro.install.types - rebootafter:reconfigafter:standard
...
smpatch set patchpro.install.types=rebootafter:reconfigafter:standard
patch information gathering
You can list the revision information of already applied patches using the
showrev
command:
showrev -p |head -3
Patch: 118344-14 Obsoletes: 122397-01 Requires: Incompatibles: Packages: SUNWcsu, SUNWcsl, SUNWckr, SUNWhea, SUNWarc, SUNWfmd
Patch: 118368-04 Obsoletes: Requires: Incompatibles: Packages: SUNWcsu
This command analyzes which patches are available for this machine
smpatch analyze
...
138110-01 SunOS 5.10_x86: ata driver patch
...
One can also analyze which dependencies need to be installed for a specific patch (or multiple patches. Multiple -i options are supported or one can pass a file with a list).
smpatch analyze -i 138110-01
You have new messages. To retrieve: smpatch messages [-a]
127128-11 SunOS 5.10_x86: kernel patch
138110-01 SunOS 5.10_x86: ata driver patch
# with a listfile
smpatch analyze -x idlist=patch-list-20081215.lst
125556-01 SunOS 5.10_x86: patch behavior patch
138270-02 SunOS 5.10_x86: devfs patch
127128-11 SunOS 5.10_x86: kernel patch
137140-06 SunOS 5.10_x86: aggr patch
119255-59 SunOS 5.10_x86: Install and Patch Utilities Patch
137122-03 SunOS 5.10_x86: e1000g driver patch
138110-01 SunOS 5.10_x86: ata driver patch
You can see the README of a specific patch by executing
smpatch download -t -i 139580-01
patch installation
- patches are downloaded to
/var/sadm/spool
by default.
- logs from install attempts and READMEs are available in
/var/sadm/patch/PATCHNUMBER/
directories
smpatch update # without additional argument applies all available patches
smpatch update -x idlist=patch-list-20081215.lst # updates specified by list
smpatch update -i patch_id1 -i patch_id1 # updates by single patch IDs
It may be that the patches require a system restart or at least going to maintenance mode to be installed. It is easiest to log in on the remote console and initiate a init 0, so one still can follow the progress, e.g.
init 0
root@t3fs03 # svc.startd: The system is coming down. Please wait.
svc.startd: 94 system services are now being stopped.
Dec 15 11:35:06 t3fs03 syslogd: going down on signal 15
Dec 15 11:35:07 rpc.metad: Terminated
Installing updates
Installing update 138270-02 Succeeded
Installing update 127128-11 Succeeded
........
Note: Careful with kernel patches. The patch 137138-09 from SUN rendered two systems unbootable (q.v. this Issue report).
Solaris Filesystem configuration (UFS, ZFS), disk partitioning, RAID
The hd and hdadm tool
The
hd tool is very useful to get an overview on the device mappings to physical disks:
One can get it from the
X4500 Solaris tools CD and install it by
pkgadd -d ./SUNWhd-1.01.pkg
hd usage:
hd # show mappings and ASCII-graphical slot display
hd -q # sequence in drive slot order. 1 & 2 are bootable disks
hd -l # list in phys order (1 spc separated line). 1 & 2 are bootable disks
hd -j # shows controller device files (PCI) and mapping to c1, c2, c3 ....
hd -w /pci@0,0/pci1022,7458@1/pci11ab,11ab@1/disk@2,0 # maps pci dev path to disk
hd -r and hd -R: read out SMART information
In a very intersting
blog entry Ben Rockwood points to the possibility of
reading out the SMART data of all disks with
hd -r
(detailed) or
hd -R
(terse). In his experience the
(1.) Raw read error rate and the
(5.) Reallocated sector count are the best indicators for disks to be replaced. He also points to an interesting
paper on disk failure statistics by google.
The
hdadm tool can be used to manage disks and bring them online/offline.
hdadm display # displays info on all drives
hdadm offline slot 11
hdadm offline disk c0t0
hdadm offline row3
hdadm offline col3
hdadm online all
Disk exchange
The
cfgadmin
command can be used to manage the hardware configuration (e.g.
connected state for devices.). This may be necessary when a disk is exchanged and ends up in disconnected state. It may require something like this (from
http://osdir.com/mlos.solaris.opensolaris.storage.general/2008-07/msg00122.html):
# cfgadm -c configure sata1/3
A lot of examples can be found on our page of
FileserverDiskProblems.
ZFS pool setup
Delete the original empty zpool:
zpool destroy zpool1
Run our "setup_zpool.sh" script which intelligently identifies the boot disks and puts out the commands to create our required zpool structure. Example:
setup_zpool.sh
# system contains 48 disks
# boot disks: c5t0 c5t4
# spare disks: c0t3 c0t7
# num of pool disks: 44
zpool create -f data1 raidz1 c4t0d0 c4t4d0 c7t0d0 c7t4d0 c6t0d0 c6t4d0 c1t0d0 c1t4d0 c0t0d0
zpool add -f data1 raidz c0t4d0 c5t1d0 c5t5d0 c4t1d0 c4t5d0 c7t1d0 c7t5d0 c6t1d0 c6t5d0
zpool add -f data1 raidz c1t1d0 c1t5d0 c0t1d0 c0t5d0 c5t2d0 c5t6d0 c4t2d0 c4t6d0 c7t2d0
zpool add -f data1 raidz c7t6d0 c6t2d0 c6t6d0 c1t2d0 c1t6d0 c0t2d0 c0t6d0 c5t3d0 c5t7d0
zpool add -f data1 raidz c4t3d0 c4t7d0 c7t3d0 c7t7d0 c6t3d0 c6t7d0 c1t3d0 c1t7d0
zpool add -f data1 spare c0t3d0 c0t7d0
monitoring disk I/O
You can use the following commands to monitor i/O broken down to single disks
zpool iostat -v
iostat -nxz 5
soon obsolete: Custom UFS partitioning and RAID setup for X4500
How to partition a X4500 and set up RAID mirroring can be found on the
InstallationSolarisPartitioning page.
soon obsolete: UFS Boot partition information of preinstalled Solaris10 X4500
The boot image in the preinstalled X4500 Solaris 10 installations of this version is not on a ZFS file system (and actually I think this is also would not give a big added value). This is the list of mounted partitions as seen in
/etc/vfstab
(and
/etc/mnttab
), among them the root partition, which sits on an
ufs file system:
Contents of
/etc/vfstab
on a SUN Solaris 10 preinstalled X4500:
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/md/dsk/d10 /dev/md/rdsk/d10 / ufs 1 no -
/devices - /devices devfs - no -
ctfs - /system/contract ctfs - no -
objfs - /system/object objfs - no -
/dev/md/dsk/d20 - - swap - no -
/dev/md/dsk/d30 /dev/md/rdsk/d30 /var ufs 1 no -
Here is a good document on the so-called soft-partitions (kind of logical volumes):
http://sysunconfig.net/unixtips/soft-partitions.html
To find out more about these so called
meta devices, one can use the
metastat
command:
metastat
d10: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d12
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 22539195 blocks (10 GB)
d11: Submirror of d10
State: Okay
Size: 22539195 blocks (10 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c5t0d0s0 0 No Okay Yes
d12: Submirror of d10
State: Okay
Size: 22539195 blocks (10 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c5t4d0s0 0 No Okay Yes
Device Relocation Information:
Device Reloc Device ID
c5t0d0 Yes id1,sd@SATA_____HITACHI_HUA7250S______GTF402P6G3NM6F
c5t4d0 Yes id1,sd@SATA_____HITACHI_HUA7250S______GTF402P6G3KPYF
The information in the last few lines points to the physical disks on which the file systems reside. This can be correlated with the information given by the
hd utility, and we can see that (naturally) these two devices are the boot disks (same is found for the d30 meta device, here).
Changing the hostname
If the hostname has been configured wrongly, you may need to make changes to all of these files. This is regrettably much harder than in Linux.
-
/etc/hosts
-
/etc/nodename
-
/etc/hostname.*
-
/etc/inet/ipnodes
Networking
Network trunking
Note: Trunking requires the switches to know about it, or only the outgoing transfer will profit from it (because there the OS will decide which interface to use).
If the switch is set to do static trunking, one can get into problems when the system is in the boot phase, because at that point it is not yet able to support the trunking. So, it can happen that the system cannot receive answers from DHCP, because the switch will send them in on another interface than the request was coming from. This problem can be solved by using the
LACP (Line Aggregation Control Protocol) technology, where the client will tell the server about when to trunk the interfaces. The switch must be configured for this!
This service should be online:
svc:/network/physical:default
detach and erase the configuration for the single interface
dladm show-dev
ifconfig nge0 unplumb
rm -f /etc/hostname.nge0
rm -f /etc/dhcp.nge0
Create the aggregated network interface:
There is a bug in the dladm on our nodes, even after many system updates having been installed:
dladm create-aggr -P L2,L3,L4 --lacp-mode=active -d e1000g0 -d e1000g1 -d e1000g2 -d e1000g3 1
Segmentation Fault (core dumped)
The trunking works if the hash function policy only uses L3,L4 or L2,L4. The
--lacp-mode=active
flag turns on LACP mode (see above) for the trunking.
dladm create-aggr -P L3,L4 --lacp-mode=active -d e1000g0 -d e1000g1 -d e1000g2 -d e1000g3 1
NOTE: One can also change the properties of an existing aggregated link
dladm modify-aggr --lacp-mode=active 1
One can list the details on the trunked interfaces
dladm show-aggr -L
key: 1 (0x0001) policy: L3,L4 address: 0:14:4f:a6:df:e4 (auto)
LACP mode: active LACP timer: short
device activity timeout aggregatable sync coll dist defaulted expired
e1000g0 active short yes yes yes yes no no
e1000g1 active short yes yes yes yes no no
e1000g2 active short yes yes yes yes no no
e1000g3 active short yes yes yes yes no no
To select a hardcoded IP/hostname configuration, make sure that
/etc/inet/hosts
(DANGER:
/etc/hosts
is just a symbolic link to that!) contains the local node's IP and hostname
echo "[my hostname" > /etc/hostname.aggr1
echo 192.33.123.1 > /etc/defaultrouter
Also put the correct information into
/etc/netmasks
.
192.33.123.0 255.255.255.0
Activate the interface and restart the service to obtain the correct network settings.
ifconfig aggr1 plumb
svcadm restart svc:/network/physical:default
Our aggr.sh script will take care of correctly trunking on both Solaris 10 and OpenSolaris systems.
TCP tuning
Explanation of the Solaris TCP parameters (older, from 2002):
http://developers.sun.com/solaris/articles/tuning_for_streaming.html
also look at the reference manual:
http://docs.sun.com/app/docs/doc/806-4015/6jd4gh8fn?a=view
From the
TCP Tuning Guide:
For Solaris create a boot script similar to this (e.g.: /etc/rc2.d/S99ndd)
#!/bin/sh
# increase max tcp window
# The maximum buffer size in bytes. It controls how large the send and receive buffers
# can be set to by an application using setsockopt(3SOCKET).
ndd -set /dev/tcp tcp_max_buf 4194304
# The maximum value of TCP congestion window (cwnd) in bytes.
ndd -set /dev/tcp tcp_cwnd_max 2097152
# The default (???) send/receive window size in bytes (i.e. what you get if the application does not ask for more)
ndd -set /dev/tcp tcp_xmit_hiwat 65536
ndd -set /dev/tcp tcp_recv_hiwat 65536
Note: The 64kB for the hiwat values are too small for WAN transfers over gigabit links.
Here are the default values on one of our Solaris10 Thumpers:
root@t3fs01 $ ndd -get /dev/tcp tcp_max_buf
1048576
root@t3fs01 $ ndd -get /dev/tcp tcp_cwnd_max
1048576
root@t3fs01 $ ndd -get /dev/tcp tcp_xmit_hiwat
49152
root@t3fs01 $ ndd -get /dev/tcp tcp_recv_hiwat
49152
This effectively limits the tcp window to 48 kbyte!!!!
How to fix the recurring t3fs06 links down issue
Sometimes during an unexpected NFS load vs NFS
t3fs06
its links goes down and that locks all the activities at T3; if that happens Nagios will report it and the status will be in
t3fs06
:
dladm show-dev
e1000g0 link: unknown speed: 1000 Mbps duplex: full
e1000g1 link: unknown speed: 1000 Mbps duplex: full
e1000g2 link: unknown speed: 1000 Mbps duplex: full
e1000g3 link: unknown speed: 1000 Mbps duplex: full
To recover try:
# ifconfig aggr1 unplumb
# ifconfig aggr1 plumb
# svcadm restart svc:/network/physical:default
Check that now the link is like:
bash-4.2# dladm show-aggr
key: 1 (0x0001) policy: L3,L4 address: 0:14:4f:a6:e1:8c (auto)
device address speed duplex link state
e1000g0 0:14:4f:a6:e1:8c 1000 Mbps full up attached
e1000g1 0:14:4f:a6:e1:8d 1000 Mbps full up attached
e1000g2 0:14:4f:a6:e1:8e 1000 Mbps full up attached
e1000g3 0:14:4f:a6:e1:8f 1000 Mbps full up attached
Syslog
Solaris 10 machines come with a syslog setup where the
/etc/syslog.conf
file is processed by m4, before it is fed to the syslogd.
If a host
loghost is defined inside of
/etc/hosts
, and this host resolves to one of the IP addresses of the local machine, the LOGHOST variable will get defined, and the various conditionals will resolve to logging to local files.
Cleanly shutting down
Shutdown:
- shutdown -i5 -g0 This turns the machine into state 5, which means a state in which the machine can be turned off, and it turns the machine off.
Rebooting:
- init 6: Use of `init 6` will give the cleanest and orderly reboot (init informs svc.startd of the runlevel change and will move to the appropriate milestone)
- shutdown -y -g0 -i6 "message" will invoke init as well as give you grace period and messages to user
NOTE: halt,reboot,poweroff will not run any of the shutdown scripts and should be last resort.
Local Firewall settings
Look at the
Tier-2 documentation.
At PSI we have a firewall in front of the Tier-3.
Performance monitoring
Dtrace
Useful dtracetoolkit commands:
- iotop - display top disk I/O events by process (list for every disk)
- rwsnoop: measuring reads and writes at the application level
- dtruss -p pid: truss equivalent
- bitesize.d: histogram of sizes of disk events per process as sent by the block I/O driver
- seeksize.d: histograms disk event seeks by process
- iopattern: differentiates sequential from random i/o
- iosnoop: lists processes doing IO (limited utility on ZFS because no filenames are given)
- opensnoop: snoops file open calls
SAR
You must install the SUNWaccu and SUNWaccr (service) packages.
Then, activate the service
svcadm enable svc:/system/sar:default
The data taking is done per cron job as the sys user: Uncomment these lines if necessary
cat /var/spool/cron/crontabs/sys
#ident "@(#)sys 1.5 92/07/14 SMI" /* SVr4.0 1.2 */
#
# The sys crontab should be used to do performance collection. See cron
# and performance manual pages for details on startup.
#
0 * * * 0-6 /usr/lib/sa/sa1
20,40 8-17 * * 1-5 /usr/lib/sa/sa1
5 18 * * 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A
Misc
Links with documentation and Howtos
OpenSolaris 2008.05 OBSOLETE
Basic OS installation
OpenSolaris was installed by attaching a USB DVD drive with the OpenSolaris distribution to the machine and booting from it.
SUN ILOM (works on X4500):
cd SP/console ; start
or
start SP/console
On x4500:
cat /etc/release
OpenSolaris 2008.05 snv_86_rc3 X86
Copyright 2008 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 26 April 2008
Form where did I have this?
SunOS Release 5.10 Version Generic_127112-10 64-bit
Marvell Serial ATA Adapter - Driver Version 3.6.1.0-5
On
SunOS unknown 5.10 Generic_127112-10 i86pc i386 i86pc
Booting into single user mode
In the grub screen, edit the kernel options and add a "-s" flag. Then boot with "b".
getting passwordless root access from the admin node
- first edit
/etc/ssh/sshd_config
to allow root access (PermitRootLogin yes
option) and restart the ssh service.
- opensolaris may require turning off the pam_roles module, because in
/etc/user_attr
root is defined as a role. This is different from what we find on a standard Solaris 10 installation.
Definitions for the
/etc/pam.conf
file:
# PSI modification to allow interactive root user remote login
sshd-pubkey auth requisite pam_authtok_get.so.1
sshd-pubkey auth required pam_dhkeys.so.1
sshd-pubkey auth required pam_unix_cred.so.1
sshd-pubkey auth required pam_unix_auth.so.1
# sshd-pubkey account requisite pam_roles.so.1 allow_remote
# allow_remote is not enough because auser information is missing
sshd-pubkey account required pam_unix_account.so.1
Note: one may also want to define this for the
sshd-kbdint module (keybord password based login).
One could also drop the definition of root as a role.
Package management
installing a standard SUN package from file using the default package manager
pkgadd -d ./SUNWhd-1.01.pkg
pkginfo # lists all installed packages
pkginfo -l CSWpuppet # lists detailed package information for this package
pkgchk -l -p /usr/lib/values-Xa.o # list packages to which this local file belongs and check against package
pkgchk -l CSWpuppet | grep Pathname # lists files belonging to the named package
Obtaining free additional packages from Blastwave
NOTE: BLASTWAVE SEEMS NOT VERY ACTIVE ANY MORE: MOST DEVELOPERS HAVE GONE TO http://www.opencsw.org/!!!!!
- Install the
pkg-get
tool from blastwave (see blastwave web page)
pkgadd -d http://www.blastwave.org/pkg_get.pkg
You may have to download the file by hand according to instructions at the blastwave page, if the weblink fails.
- You may need to modify the repository url in
/opt/csw/etc/pkg-get.conf
to point to a valid mirror. We use url=http://blastwave.network.com/csw/stable
- add the path
/opt/csw/bin
in /etc/default/login
PATH=/opt/csw/bin:/usr/bin:
SUPATH=/sbin:/usr/sbin:/opt/csw/bin:/usr/bin
- update the MANPATH in
/etc/profile
MANPATH=/usr/share/man:/opt/csw/share/man
export MANPATH
pkg-get example usage:
pkg-get -D regexp # searches for a package
pkg-get install top # installs package "top"
pkginfo CSWtop # shows info about a package
special pkg messages are saved in logfiles in
/var/sadm/install/logs/
Packages to install
pkg-get -i gsed
the new package manager 'pkg' (not used by us)
A new package manager
pkg is in development and it seems that it used for some packages. It is orthogonal to the other package management and this is a little disconcerting. We do not use it.
Still, here is some example usage:
# pfexec is like sudo and often used for issueing the following comands as another user
pkg refresh
pkg list [fmri-pattern] # list packages (very slow)
pkg info [fmri-pattern] # list package information (long format)
pkg authority # displays repository info
pkg search gnu # search for packages with this substring
pkg contents SUNWgnu-coreutils
pkg install SUNWvim
packages get downloaded from opensolaris servers via http (Look at
http://pkg.opensolaris.org/status).
ZFS file system and disk management
Very nice links
How to determine what ZFS features are available on a system:
zfs upgrade -v
The following filesystem versions are supported:
VER DESCRIPTION
--- --------------------------------------------------------
1 Initial ZFS filesystem version
2 Enhanced directory entries
3 Case insensitive and File system unique identifer (FUID)
4 userquota, groupquota properties
For more information on a particular version, including supported releases, see:
http://www.opensolaris.org/os/community/zfs/version/zpl/N
Where 'N' is the version number.
On OpenSolaris the root partition already sits on a zpool:
mount
/ on rpool/ROOT/opensolaris read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90002
zfs list
rpool 2.92G 450G 55K /rpool
rpool@install 16K - 55K -
rpool/ROOT 2.92G 450G 18K /rpool/ROOT
rpool/ROOT@install 15K - 18K -
rpool/ROOT/opensolaris 2.92G 450G 2.33G legacy
rpool/ROOT/opensolaris@install 196M - 2.22G -
rpool/ROOT/opensolaris/opt 412M 450G 411M /opt
rpool/ROOT/opensolaris/opt@install 118K - 3.60M -
rpool/export 540K 450G 19K /export
rpool/export@install 15K - 19K -
rpool/export/home 506K 450G 488K /export/home
rpool/export/home@install 18K - 21K -
Locating the boot disk
cfgadm|grep sata3/0 # x4500: for determining boot disk
All device files are located under
/dev/dsk/
The
hd tool is very useful to get an overview on the device mappings to physical disks:
One can get it from the Solaris tools CD by using
pkgadd -d ./SUNWhd-1.01.pkg
hd usage:
hd # show mappings and ASCII-graphical slot display
hd -q # sequence in drive slot order. 1 & 2 are bootable disks
hd -l # list in phys order (1 spc separated line). 1 & 2 are bootable disks
hd -j # shows controller device files (PCI) and mapping to c1, c2, c3 ....
hd -w /pci@0,0/pci1022,7458@1/pci11ab,11ab@1/disk@2,0 # maps pci dev path to disk
ZFS command examples
zfs list : shows mountpoints
zpool status -v [poolname] # status + device. This also will show corrupted files, e.g. after a resilver
zpool status -x # health of all pools
zpool online tank c1t0d0 # bring a pool device online
zpool iostat
zpool iostat rpool 2
zpool iostat -v # device specific information
zpool import # lists exported pools ready to be imported
zpool import [-f] zpool1
zpool upgrade zpool1
zpool destroy zpool1
zpool create tank mirror c1d0 c2d0 mirror c3d0 c4d0
zpool create tank raidz c1t0d0 c2t0d0 c3t0d0 c4t0d0 c5t0d0
zpool create pool mirror c0d0 c1d0 spare c2d0 c3d0 # spare
Networking
The solarisfs profile which can be synced from the admin node contains a script for setting up an aggregated network device.
Since it will detach the current network connections, it should be run either in batch mode (e.g. using the
at
service with
at now
) or inside a console session.
bash aggr.sh -f
for dhcp to work correctly, one needs two files
-
/etc/hostname.[interface]
makes the interface persistant and may contain the hostname
-
/etc/dhcp.[interface]
is needed to get the dhcp parameters to be used for the local configuration
A little command overview:
ifconfig -a
dladm show-dev e1000g0
dladm show-link [-s] [linkname]
# disable nwam and enable default
svcs disable svc:/network/physical:nwam
svcs enable svc:/network/physical:default
# must have a /etc/hostname.aggr1 to make
# aggregate persistent
ifconfig e1000g1 unplumb
dladm create-aggr -l e1000g0 ... aggr1
dladm show-aggr [-L] [-x] [-s]
dladm delete-aggr aggr1
ifconfig aggr1 plumb
# aggr seems to get MAC of first link
ifconfig aggr1 dhcp start
netstat -D # shows state of dhcp per interface
# dhcp config in /etc/default/dhcpagent
dhcpinfo -i e1000g0 12
dhcpinfo -i e1000g0 Hostname
How to control *inetd* related services:
inetadm # lists all services and their status
inetadm -l svc:/network/rpc/smserver:default # shows config details for that service
Network routing table
CAREFUL: The file
/etc/defaultrouter
can be used to hardcode a default route (just put the IP of the target machine in). If this file contains a setting, it will be added to the settings that are obtained by DHCP, and
you may end up with 2 default routes. This can lead to unpredictable results, if the hardcoded IP points to a wrong place.
Some commands to change routes:
route add 192.33.123.0/24 192.33.123.44 -interface # adding a route with gw being
# machine's own interface
route add default 192.33.123.1
route delete default 192.33.123.41 # deleting a gateway default route
DNS
Do not forget to modify the
/etc/nsswitch.conf
file to use DNS information. Here you can define whether only file based information (
/etc/hosts
) or also the resolver is used.
How to enable jumbo frames
Refer to the Solaris driver's
e1000g man page.
This is not yet tested!
You can look at the present frame size with
ndd -get /dev/e1000g0 max_frame_size
1514
ifconfig e1000g0 mtu 16128 # enable jumbo frames
It seems that one can also add the following line to the /etc/hostname.aggr1 file
mtu 16128
It might be that one has to change by hand the
MaxFrameSize
line in the driver's configuration file
/kernel/drv/e1000g.conf
, which by default contains the following.
MaxFrameSize=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
# 0 is for normal ethernet frames.
# 1 is for upto 4k size frames.
# 2 is for upto 8k size frames.
# 3 is for upto 16k size frames.
# These are maximum frame limits, not the actual ethernet frame
# size. Your actual ethernet frame size would be determined by
# protocol stack configuration (please refer to ndd command man pages)
# For Jumbo Frame Support (9k ethernet packet)
# use 3 (upto 16k size frames)
Note: - On a freshly installed
Solaris 10 10/08 s10x_u6wos_07b X86
system the ndd command did not accept the max_frame_size parameter. Need to check, why this is missing.
ndd -get /dev/e1000g0 max_frame_size
operation failed: Invalid argument
Services
Service management is done with the
scvadmin
,
svcs
and
svcprop
commands
You can get a list of service udentifiers + description by doing
svcs -o FMRI,DESC
Example Usage:
svcs # lists all services
svcs -xv # diagnose service startup problems
svcs -l gdm # lists details of sevice specified by pattern or _FMRI_ ID
svcs -d gdm # lists all other services on which this service depends
svcs -D fc-cache # lists all services depending on the given service
svcs -p gdm # list all pids associated with the named service
svcadm enable gdm # enable a service
svcadm enable -t gdm # temporarily enable a service
svcadm disable gdm # disable a service
svcadm disable -t gdm # temporarily disable a service
svcadm restart gdm
svcadm clear gdm # if service is in maintenance state, signal that we have repaired and can start
svcprop gdm # lists service configuration properties for services framework
Services to disable
Disable these services
for n in "
finger
sendmail
ppd-cache-update
ogl-select gdm
"; do svcadm disable $n ; done
Patches and updates
!Opensolaris has no real patch service. There is a GUI application that can be used to get an overview over packages that can be updated:
packagemanager
.
One may also have a look at the
image-update [-nvq]
command.
Compiling with gcc
Note: Careful! The many updates needed to get gcc working on this installation rendered the system unbootable. This derived from an update to ZFS, where the bootloader could no longer read the old zfs root partition.
Make sure that package CSWgcc3 is installed
Set up the environment
export PATH=$PATH:/opt/csw/gcc3/bin
-bash-3.2# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/csw/gcc3/lib
I needed to install the SUNWarc package to get gcc working for OpenSolaris. This
requires using the new package manager. It is not very nice that we need to rely on two different package management systems.
(tip from
http://wiki.csclub.uwaterloo.ca/OpenSolaris)
pkg install SUNWarc SUNWsfwhea SUNWhea SUNWtoo
This took quite some time, since a number of packages needed to be fetched from the opensolaris web site. I wonder whether we can
mirror such packages locally.
The error had been:
gcc test.c
ld: fatal: file values-Xa.o: open failed: No such file or directory
ld: fatal: File processing errors. No output written to a.out
collect2: ld returned 1 exit status
OpenSolaris boot problem after update
After some update (I do not know whether it was an automatic update or one made by me some days ago), the system was no longer able to boot. The root file system had been installed on a ZFS fs by the OS installation process, and there was no obvious place where one could have selected the boot partition FS as an option. One of the updates seemingly introduced an incompatibility of the bootloader and that version of ZFS. This is the related
OpenSolaris Bug 6534519. I was not able to save the system like described on that page (but I did not manage to completely follow the instructions).
I reinstalled OpenSolaris. A further bad point is that I cannot get the ILOM serial console redirection to work with opensolaris. It works for the BIOS startup, but fails before grub comes up. After the reinstallation, I did a reboot, and the machine (using the graphical ILOM redirection) always came back with a black screen and a "0037" BIOS message in the lower right corner. Found a
Forum thread on a similar problem. And indeed, powering the machine off for some time and rebooting brought back the OpenSolaris login prompt.
--
DerekFeichtinger - 05 Aug 2008