SGE 6.2u5 plus ARCO MySQL on SL6 64bit powered by ZFS
Revision 4, 2011-03-06 17:26:37
Sun Grid Engine project home page:
http://gridengine.sunsource.net/
This document describes the experiences gained during the upgrade of the SGE installation from 6.1 to 6.2u5, the last free version of this batch system; apart from the SGE upgrade itself that introduced several new features in the batch system we migrated also O.S., the method to manage accounting by introducing a DB and we introduced the ZFS driver to use this advanced filesystem in our Linux context.
HW installation
For our installation we detached t3ui07 from the cluster and we converted in t3ce02, our new SGE master; because of the criticality of this new machine we made a HW RAID1 configuration in the LSI Bios at boot time. The final layout is a 140GB LSI Virtual Volume that we partitioned during the SL6 installation in according to this commands output:
[root@t3ce02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 9.7G 2.3G 6.9G 25% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
/dev/sda1 485M 34M 426M 8% /boot
[root@t3ce02 ~]# mount
/dev/sda3 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
/dev/sda1 on /boot type ext4 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
Because there are 4 Gigabit NICs in the server it's worth to connect to the switch as many NICs it's possible and later configure a
Linux Bonding configuration type 6 to improve the server bandwidth and availability. For the time being we skipped this step.
SL6 64bit Installation
So far we have just 1 server like and probably this is going to stay in the future so we simply pointed the Virtual CD of t3ce02 to a SL6 DVD iso file we saved in t3admin01:/home/ and made a "Basic Server" installation, that's enough to have installed utilities like SSHs, yum, .. so we can select the other RPMs at run time. The "Basic Server" installation turns ON selinux by default, to disable it edit this file and eventually reboot the system:
[root@t3ce02 ~]# grep -v \# /etc/sysconfig/selinux
SELINUX=disabled
SELINUXTYPE=targeted
[root@t3ce02 ~]#
also turn OFF cron yum updated editing this file:
/etc/sysconfig/yum-autoupdate
install these i686 RPMs, later they are needed by the Sun Web Console and also by the LSI RAID utility mpt-status:
[root@t3ce02 ~]# yum install glibc.i686
...
Dependencies Resolved
================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================
Installing:
glibc i686 2.12-1.7.el6_0.3 sl-security 4.3 M
Installing for dependencies:
nss-softokn-freebl i686 3.12.8-1.el6_0 sl-security 108 k
Updating for dependencies:
glibc x86_64 2.12-1.7.el6_0.3 sl-security 3.7 M
glibc-common x86_64 2.12-1.7.el6_0.3 sl-security 14 M
nss-softokn-freebl x86_64 3.12.8-1.el6_0 sl-security 114 k
Transaction Summary
================================================================================================================================
Install 2 Package(s)
Upgrade 3 Package(s)
Total size: 22 M
Total download size: 4.4 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): glibc-2.12-1.7.el6_0.3.i686.rpm | 4.3 MB 00:09
(2/2): nss-softokn-freebl-3.12.8-1.el6_0.i686.rpm | 108 kB 00:00
--------------------------------------------------------------------------------------------------------------------------------
...
Complete!
[root@t3ce02 ~]#
now you can install the LSI RAID checker "mpt-status":
[root@t3ce02 ~]# rpm -Uv http://www.drugphish.ch/~ratz/mpt-status/RPMS/1.2.0_RC7/mpt-status-1.2.0_RC7-3.i386.rpm
Retrieving http://www.drugphish.ch/~ratz/mpt-status/RPMS/1.2.0_RC7/mpt-status-1.2.0_RC7-3.i386.rpm
Preparing packages for installation...
mpt-status-1.2.0_RC7-3
[root@t3ce02 ~]#
load the driver and verify the RAID1 status:
[root@t3ce02 ~]# modprobe mptctl
[root@t3ce02 ~]# mpt-status
ioc0 vol_id 0 type IM, 2 phy, 135 GB, state OPTIMAL, flags ENABLED
ioc0 phy 1 scsi_id 2 SEAGATE ST914602SSUN146G 0603, 136 GB, state ONLINE, flags NONE
ioc0 phy 0 scsi_id 1 SEAGATE ST914602SSUN146G 0603, 136 GB, state ONLINE, flags NONE
[root@t3ce02 ~]#
curiously I couldn't find /etc/modprobe.conf, so I just ran:
[root@t3ce02 etc]# echo modprobe mptctl >> /etc/rc.local
ok if you still have to reboot now it's time to do it.
ZFS on SL6 64bit.
A new O.S. release always delivers some news, about the SL6 kernel one news is the interesting opportunity to run
ZFS filesystems; so far we used the ZFS driver version
zfs-linux-20110214.tar.bz2;
ZFS allows to create RPMs that are always appreciated by Sys Admins so be sure to have the RPM rpm-build deployed in you O.S. before to try to compile ZFS.
Once you downloaded the file zfs-linux-20110214.tar.bz2 create a dir in /opt/zfs-build to build the ZFS RPMs, copy there the file zfs-linux-20110214.tar.bz2, and open it with tar -xjvf zfs-linux-20110214.tar.bz2, then follow these macro steps:
[root@t3ce02 zfs-build]# ll
total 19680
drwxr-xr-x 9 root root 4096 Mar 3 14:53 lzfs
drwxr-xr-x 4 root root 4096 Mar 3 14:28 misc-scripts
drwxr-xr-x 11 root root 4096 Mar 3 14:35 spl
drwxr-xr-x 14 root root 4096 Mar 3 14:32 zfs
-rw-r--r-- 1 root root 20132179 Feb 14 15:28 zfs-linux-20110214.tar.bz2
cd /opt/zfs-build/lzfs
./configure && make rpm
cd /opt/zfs-build/spl
./configure && make rpm
cd /opt/zfs-build/zfs
./configure && make rpm
yum install /opt/zfs-build/spl/*.rpm
yum install /opt/zfs-build/zfs/*.rpm
yum install /opt/zfs-build/lzfs/*.rpm
Here you can see the RPMs so far involved in the O.S. installation + the ZFS RPMs just produced and installed
t3ce02.RPMs.list.after.ZFS.installation.txt.
Here is the md5sums list of the ZFS RPMs produced; all the RPMs are available at the bottom of this Wiki page:
[root@t3ce02 zfs-build]# find . | grep \\.rpm | xargs -iI md5sum I
e6b0b62d710689586ee9cbbe8f6defdd ./spl/spl-0.5.2-1.x86_64.rpm
a36c6797ba234f3935ea351c07002c61 ./spl/spl-modules-0.5.2-1_2.6.32_71.18.1.el6.x86_64.rpm
f462f15ab6c5a38db10290b38fcede8c ./spl/spl-modules-devel-0.5.2-1_2.6.32_71.18.1.el6.x86_64.rpm
9397f335a0d33196a652e37b3a52b6ba ./spl/spl-modules-0.5.2-1.src.rpm
e05f6da1226dd47b171b9764a15f488b ./spl/spl-0.5.2-1.src.rpm
afe350394b3e9edd833dd15f1506e675 ./lzfs/lzfs-1.0-1.src.rpm
d3f9f6b6f0344bf95620c01b5fad3b2e ./lzfs/lzfs-1.0-1_2.6.32_71.18.1.el6.x86_64.rpm
fecce1786206c71c20701d7872a5ca87 ./zfs/zfs-modules-0.5.1-1.src.rpm
e646e0ea853f8ce8c4166fa388dd1ecd ./zfs/zfs-test-0.5.1-1.x86_64.rpm
1c7f4d7b34e4a8b92981b6b2bce875e4 ./zfs/zfs-0.5.1-1.x86_64.rpm
547d3680339b99ac99217f2f43e2b544 ./zfs/zfs-devel-0.5.1-1.x86_64.rpm
01368ff2a044612573481b3cb154ab58 ./zfs/zfs-0.5.1-1.src.rpm
23892b48ed147a166ac7d1b0ff3fb9ee ./zfs/zfs-modules-devel-0.5.1-1_2.6.32_71.18.1.el6.x86_64.rpm
5e767ad12087ed18c7d72b05b39530d1 ./zfs/zfs-modules-0.5.1-1_2.6.32_71.18.1.el6.x86_64.rpm
We partitioned the rest of the disk like sda4 to become a ZFS pool where to create ZFS filesystems:
[root@t3ce02 ~]# fdisk -l
Disk /dev/sda: 146.0 GB, 145999527936 bytes
255 heads, 63 sectors/track, 17750 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000d12bc
Device Boot Start End Blocks Id System
/dev/sda1 * 1 64 512000 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 64 1339 10240000 82 Linux swap / Solaris
/dev/sda3 1339 2614 10240000 83 Linux
/dev/sda4 2614 17751 121584640 83 Linux
this is the command we ran to create the pool:
[root@t3ce02 ~]# zpool create -f zfspool -m /mnt/zfs sda4
[root@t3ce02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 9.7G 2.9G 6.3G 31% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
/dev/sda1 485M 57M 403M 13% /boot
zfspool 114G 21K 114G 1% /mnt/zfs
[root@t3ce02 ~]#
MySQL ZFS filesystem
On the official
MySQL website we read about good performances regarding the relationship
MySQL/ZFS, so we applied that procedure to create the ZFS fs to store our
MySQL; this DB is going to be used by the
SGE ARCO tool.
[root@t3ce02 zfs]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 9.7G 2.9G 6.3G 32% /
tmpfs 7.8G 0 7.8G 0% /dev/shm
/dev/sda1 485M 57M 403M 13% /boot
zfspool 114G 5.9G 108G 6% /mnt/zfs
[root@t3ce02 zfs]# zfs create zfspool/mysql
[root@t3ce02 zfs]# zfs set recordsize=16K zfspool/mysql
Because we prepared a ZFS filesystem for
MySQL let's continue installing mysql-server and relocating the files on ZFS, please follow these macro steps:
yum install mysql-server
/etc/init.d/mysqld stop
cd /var/lib
mv mysql /mnt/zfs/mysql && ln -s /mnt/zfs/mysql/mysql .
/etc/init.d/mysqld start
chkconfig mysql on
to manage
MySQL you can use several tools, probably the most common choice is to deploy
mysql-workbench or phpmyadmin;
We liked and installed
https://t3ce02.psi.ch/phpmyadmin/.
Now we can prepare the sge_arco DB and the related 2
MySQL users, user 'arco_read' that's used by the ARCO Web application to run queries and the user 'arco_write' that's used by the reporting module to parse the SGE reporting file /gridware/sge/default/common/reporting and insert new rows in the DB sge_arco.
We followed the
ARCO procedure for the MySQL case.
This is our final permissions layout in
MySQL:
User Host Password Global privileges Tip Grant
arco_read % Yes USAGE No Edit Privileges
arco_read localhost Yes USAGE No Edit Privileges
arco_write % Yes ALL PRIVILEGES Yes Edit Privileges
arco_write localhost Yes ALL PRIVILEGES Yes Edit Privileges
root 127.0.0.1 Yes ALL PRIVILEGES Yes Edit Privileges
root localhost Yes ALL PRIVILEGES Yes Edit Privileges
root t3ce02 Yes ALL PRIVILEGES Yes Edit Privileges
root t3ce02.psi.ch Yes ALL PRIVILEGES Yes Edit Privileges
MySQL Query logging
To debug what's happening in our db it's worth to enable the query logging feature of
MySQL, so this is the /etc/my.cnf, please look the 'log' tag:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
log=/var/lib/mysql/general.log
# Disabling symbolic-links is recommended to prevent assorted security risks
# symbolic-links=0
#
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
at run time we can use the command 'tail' to debug the queries:
[root@t3ce02 sun]# tail -f /mnt/zfs/mysql/mysql/general.log
/usr/libexec/mysqld, Version: 5.1.52-log (Source distribution). started with:
Tcp port: 0 Unix socket: /var/lib/mysql/mysql.sock
Time Id Command Argument
110303 17:48:13 1 Connect Access denied for user 'UNKNOWN_MYSQL_USER'@'localhost' (using password: NO)
Sun Web Console installation
The first step to do to install SGE and SGE ARCO is to deploy the
Sun Web Console, basically a Java framework developed by Sun to host their Java web applications. Online there is a procedure to install but we prefered to report here the steps:
So starting from these SGE6.2u5 files in /opt:
[root@t3ce02 SGE6.2u5]# ll
total 221396
-rw-r--r-- 1 root root 3865332 Feb 24 10:20 sdm10u5_core_rpm.zip
-rw-r--r-- 1 root root 3868219 Feb 24 10:20 sdm10u5_core_targz.zip
-rw-r--r-- 1 root root 10271047 Feb 24 10:20 sge62u5_arco_rpm.zip
-rw-r--r-- 1 root root 10305829 Feb 24 10:20 sge62u5_arco_targz.zip
-rw-r--r-- 1 root root 18839411 Feb 24 10:21 sge62u5_inspect_rpm.zip
-rw-r--r-- 1 root root 18899376 Feb 24 10:21 sge62u5_inspect_targz.zip
-rw-r--r-- 1 root root 29514366 Feb 24 11:17 sge62u5_linux24-i586_rpm.zip
-rw-r--r-- 1 root root 29533073 Feb 24 10:20 sge62u5_linux24-x64_rpm.zip
-rw-r--r-- 1 root root 34009465 Feb 24 10:20 sge62u5_sources+gpl-code_targz.zip
-rw-r--r-- 1 root root 67576445 Feb 24 10:21 webconsole3.0.2-linux.targz.zip
[root@t3ce02 SGE6.2u5]# md5sum *
c89ab2b3db585a5df092ac3399bcdb21 sdm10u5_core_rpm.zip
0bbccb40251dd189c22496d5f945c4f6 sdm10u5_core_targz.zip
188d3e28313b629f19dae761a8b6522b sge62u5_arco_rpm.zip
e24d3b8e7e11447312771c3cdaf03687 sge62u5_arco_targz.zip
fe8f85829bb57938e8edc09186a93afa sge62u5_inspect_rpm.zip
d40484210cde65a880e3eab86651ab9e sge62u5_inspect_targz.zip
68f232beeb66a94c12f286860f07185e sge62u5_linux24-i586_rpm.zip
23a81889b532253f1a1573ac3145111b sge62u5_linux24-x64_rpm.zip
0d1fd15da1aee3bb159eb0b5dccae0cb sge62u5_sources+gpl-code_targz.zip
b931ec2bde0137ebaeae4c4669a65df1 webconsole3.0.2-linux.targz.zip
[root@t3ce02 SGE6.2u5]#
Let's unzip the webconsole package:
[root@t3ce02 SGE6.2u5]# unzip webconsole3.0.2-linux.targz.zip
Archive: webconsole3.0.2-linux.targz.zip
inflating: sge6_2u5/webconsole3.0.2-linux.tar.gz
[root@t3ce02 SGE6.2u5]# cd sge6_2u5/
[root@t3ce02 sge6_2u5]# tar -xzvf webconsole3.0.2-linux.tar.gz
SUNWjato-2.1.5.i386.rpm
SUNWjatodmo-2.1.5.i386.rpm
SUNWjatodoc-2.1.5.i386.rpm
SUNWmcon-3.0.2-5.i386.rpm
SUNWmconr-3.0.2-5.i386.rpm
SUNWmcos-3.0.2-5.i386.rpm
SUNWmcosx-3.0.2-5.i386.rpm
SUNWmctag-3.0.2-5.i386.rpm
config_properties.tpl
jdk-1_5_0_04-linux-i586.rpm
setup
sun-javahelp-2.0_01-fcs.i586.rpm
.pkgrc
.setup_default
[root@t3ce02 sge6_2u5]#
Be sure to install the RPM pam.i686 because Sun Web Console is 32bit software and then install the framework:
[root@t3ce02 sge6_2u5]# ./setup
Preparing packages for installation...
jdk-1.5.0_04-fcs
Preparing packages for installation...
sun-javahelp-2.0-fcs
Linking JavaHelp to /usr/java/jdk1.5.0_04 ...
Preparing packages for installation...
SUNWjato-2.1.5-9
Preparing packages for installation...
SUNWjatodoc-2.1.5-9
Preparing packages for installation...
SUNWjatodmo-2.1.5-9
Preparing packages for installation...
SUNWmctag-3.0.2-5
Preparing packages for installation...
SUNWmconr-3.0.2-5
Preparing packages for installation...
SUNWmcon-3.0.2-5
Preparing packages for installation...
SUNWmcos-3.0.2-5
Preparing packages for installation...
SUNWmcosx-3.0.2-5
Installation complete.
Starting Sun Java(TM) Web Console Version 3.0.2 ...
The console is running.
[root@t3ce02 sge6_2u5]#
The Sun Web Console is listening on TCP 6789:
[root@t3ce02 sge6_2u5]# netstat -tpln |grep java
tcp 0 0 ::ffff:127.0.0.1:41086 :::* LISTEN 7013/java
tcp 0 0 :::6788 :::* LISTEN 7013/java
tcp 0 0 :::6789 :::* LISTEN 7013/java
and you can access with your Linux credentials root/pwd py pointing your browser to
https://t3ce02.psi.ch:6789/
Here you can see the Sun Web Console logs:
[root@t3ce02 sun]# tail /var/log/webconsole/console/console_debug_log
==============================================================
Java Web Console Version 3.0.2 started on Thu Mar 3 17:17:05 CET 2011
==============================================================
[root@t3ce02 sun]#
SGE QMASTER 6.2u5 installation
Now we can install SGE, please have a look to the following steps:
[root@t3ce02 SGE6.2u5]# unzip sge62u5_linux24-x64_rpm.zip
Archive: sge62u5_linux24-x64_rpm.zip
inflating: sge6_2u5/sun-sge-bin-linux24-x64-6.2-5.x86_64.rpm
inflating: sge6_2u5/sun-sge-common-6.2-5.noarch.rpm
[root@t3ce02 SGE6.2u5]# cd sge6_2u5/
[root@t3ce02 sge6_2u5]# ll
total 161640
-r--r--r-- 1 root bin 1235 Dec 9 2006 config_properties.tpl
-rw-r--r-- 1 102852 wheel 47286234 Jul 27 2005 jdk-1_5_0_04-linux-i586.rpm
-r-xr-xr-x 1 root bin 48781 Dec 9 2006 setup
-rw-r--r-- 1 5074 wheel 6340876 May 11 2004 sun-javahelp-2.0_01-fcs.i586.rpm
-rw-r--r-- 1 root root 25583219 Dec 15 2009 sun-sge-bin-linux24-x64-6.2-5.x86_64.rpm
-rw-r--r-- 1 root root 4161238 Dec 15 2009 sun-sge-common-6.2-5.noarch.rpm
-r--r--r-- 1 root bin 731610 Nov 8 2005 SUNWjato-2.1.5.i386.rpm
-r--r--r-- 1 root bin 1216562 Nov 8 2005 SUNWjatodmo-2.1.5.i386.rpm
-r--r--r-- 1 root bin 1049729 Nov 8 2005 SUNWjatodoc-2.1.5.i386.rpm
-rw-rw-r-- 1 root bin 10504152 Dec 9 2006 SUNWmcon-3.0.2-5.i386.rpm
-rw-rw-r-- 1 root bin 29130 Dec 9 2006 SUNWmconr-3.0.2-5.i386.rpm
-rw-rw-r-- 1 root bin 46593 Dec 9 2006 SUNWmcos-3.0.2-5.i386.rpm
-rw-rw-r-- 1 root bin 3803 Dec 9 2006 SUNWmcosx-3.0.2-5.i386.rpm
-rw-rw-r-- 1 root bin 919212 Dec 9 2006 SUNWmctag-3.0.2-5.i386.rpm
-rw-r--r-- 1 root root 67566632 Dec 15 2009 webconsole3.0.2-linux.tar.gz
[root@t3ce02 sge6_2u5]# yum install sun-sge-bin-linux24-x64-6.2-5.x86_64.rpm sun-sge-common-6.2-5.noarch.rpm
...
Dependencies Resolved
================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================
Installing:
sun-sge-bin-linux24-x64 x86_64 6.2-5 /sun-sge-bin-linux24-x64-6.2-5.x86_64 61 M
sun-sge-common noarch 6.2-5 /sun-sge-common-6.2-5.noarch 11 M
Installing for dependencies:
ksh x86_64 20100621-2.el6 sl 655 k
libXp x86_64 1.0.0-15.1.el6 sl 22 k
libXpm x86_64 3.5.8-2.el6 sl 58 k
openmotif22 x86_64 2.2.3-19.el6 sl 1.2 M
tcl x86_64 1:8.5.7-6.el6 sl 1.9 M
...
Complete!
Move the SGE installation on the ZFS filesystem:
[root@t3ce02 /]# mv gridware/ /mnt/zfs/sge/ && ln -s /mnt/zfs/sge/gridware .
[root@t3ce02 /]# ll gridware
lrwxrwxrwx 1 root root 21 Mar 3 17:58 gridware -> /mnt/zfs/sge/gridware
Let's install SGE by running the script start_gui_installer; this is the final configuration we did:
Grid Engine cluster configuration
Grid Engine root directory ($SGE_ROOT)
/mnt/zfs/sge/gridware/sge
Cell name ($SGE_CELL)
default
Cluster name ($SGE_CLUSTER_NAME)
p6444
Qmaster port ($SGE_QMASTER_PORT)
6444
Execd port ($SGE_EXECD_PORT)
6445
Group id range ($SGE_GID_RANGE)
20000-20100
Qmaster spool directory
/mnt/zfs/sge/gridware/sge/default/spool/qmaster
Global execd spool directory
/mnt/zfs/sge/gridware/sge/default/spool
Spooling method
berkeleydb
Spooling directory
/mnt/zfs/sge/gridware/sge/default/spool/spooldb
JMX port
6446
JVM library path
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server/libjvm.so
JMX SSL server keystore path
/var/sgeCA/port6444/default/private/keystore
Administrator mail
fabio.martinelli@psi.ch
Succeded
Failed
Qmaster host
t3ce02.psi.ch
Execution host(s)
t3ce02.psi.ch
Shadow host(s)
Berkeley db host
Admin host(s)
t3ce02.psi.ch
Submit host(s)
t3ce02.psi.ch
How to start with Grid Engine
Set the environment... ... if you are a csh/tcsh user: source /mnt/zfs/sge/gridware/sge/default/common/settings.csh ... if you are a sh/ksh user: . /mnt/zfs/sge/gridware/sge/default/common/settings.sh This will set or expand the following environment variables:
$SGE_ROOT (always necessary)
$SGE_CELL (if you are using a cell other than default)
$SGE_CLUSTER_NAME (always necessary)
$SGE_QMASTER_PORT (if you haven't added the service sge_qmaster)
$SGE_EXECD_PORT (if you haven't added the service sge_execd)
$PATH/$path (to find the Grid Engine binaries)
$MANPATH (to access the manual pages)
Submit one of the sample scripts contained in the /mnt/zfs/sge/gridware/sge/examples/jobs directory. qsub /mnt/zfs/sge/gridware/sge/examples/jobs/simple.sh or qsub /mnt/zfs/sge/gridware/sge/examples/jobs/sleeper.sh
Use the qstat command to monitor the job's behavior. qstat -f
After the job finishes executing, check your home directory for the redirected stdout/stderr files script-name.ejob-id and script-name.ojob-id. The job-id is a consecutive unique integer number assigned to each job.
Administering Grid Engine
Grid Engine startup scripts can be found at: Qmaster: /mnt/zfs/sge/gridware/sge/default/common/sgemaster start/stop Exec daemon: /mnt/zfs/sge/gridware/sge/default/common/sgeexecd start/stop After startup the daemons log their messages in their spool directories. Qmaster: /mnt/zfs/sge/gridware/sge/default/spool/qmaster/messages Exec daemon: //messages
Useful links
Sun Grid Engine Information Center
http://wikis.sun.com/display/SunGridEngine/Home
Grid Engine project home
http://gridengine.sunsource.net
SGE setting scripts
Please create these symbolic links:
[root@t3ce02 profile.d]# pwd
/etc/profile.d
[root@t3ce02 profile.d]# ll se*
lrwxrwxrwx 1 root root 53 Mar 3 18:03 settings.csh -> /mnt/zfs/sge/gridware/sge/default/common/settings.csh
lrwxrwxrwx 1 root root 52 Mar 3 18:03 settings.sh -> /mnt/zfs/sge/gridware/sge/default/common/settings.sh
[root@t3ce02 profile.d]#
then logout and login again by SSH.
SGE configuration tuning
Now to enable the SGE reporting file and to save job logs on the server where the job ran we tuned the SGE conf with the 'qconf -mconf' command taking into account this fragment :
...
execd_params KEEP_ACTIVE=1 ENABLE_ADDGRP_KILL=TRUE \
H_MEMORYLOCKED=infinity
reporting_params accounting=true reporting=true \
flush_time=00:00:15 joblog=true sharelog=00:00:00
...
and
[root@t3ce02 sge6_2u5]# qconf -se global
hostname global
load_scaling NONE
complex_values NONE
load_values NONE
processors 0
user_lists NONE
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling NONE
report_variables cpu,np_load_avg,mem_free,virtual_free
[root@t3ce02 sge6_2u5]#
and on the SGE scheduler
[root@t3ce02 ~]# qconf -ssconf
...
schedd_job_info true
...
SGE dbwriter
Once the SGE master is properly working we can install the dbwriter tool, that involves the
MySQL user 'arco_write', and the ARCO reporting software, that involves the
MySQL user 'arco_read' and the Sun Web Console. Please have a look to the official
SGE documentation.
RPMs
This was our installation experiences:
[root@t3ce02 SGE6.2u5]# unzip sge62u5_arco_rpm.zip
Archive: sge62u5_arco_rpm.zip
inflating: sge6_2u5/sun-sge-arco-6.2-5.noarch.rpm
[root@t3ce02 SGE6.2u5]# cd sge6_2u5
[root@t3ce02 sge6_2u5]# yum install sun-sge-arco-6.2-5.noarch.rpm
Setting up Install Process
Examining sun-sge-arco-6.2-5.noarch.rpm: sun-sge-arco-6.2-5.noarch
Marking sun-sge-arco-6.2-5.noarch.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package sun-sge-arco.noarch 0:6.2-5 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================
Installing:
sun-sge-arco noarch 6.2-5 /sun-sge-arco-6.2-5.noarch 19 M
Transaction Summary
================================================================================================================================
Install 1 Package(s)
Upgrade 0 Package(s)
Total size: 19 M
Installed size: 19 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : sun-sge-arco-6.2-5.noarch 1/1
Installed:
sun-sge-arco.noarch 0:6.2-5
Complete!
[root@t3ce02 sge6_2u5]#
MySQL JDBC driver
Be sure that you have a
MySQL JDBC driver file and link that file inside the SGE dir:
[root@t3ce02 sge6_2u5]# yum install mysql-connector-java.x86_64
...
[root@t3ce02 lib]# pwd
/mnt/zfs/sge/gridware/sge/dbwriter/lib
[root@t3ce02 lib]# ln -s /usr/share/java/mysql-connector-java.jar
Installation /inst_dbwriter
During the dbwriter installation itself, that's well reported on the official SGE site, we were prompted for several things, one is which Java to use, there we specified '/etc/alternatives/jre/' to be protected by a System java update. So we ran:
cd $SGE_ROOT/dbwriter && /inst_dbwriter
...
All parameters are now collected
--------------------------------
SGE_ROOT=/mnt/zfs/sge/gridware/sge
SGE_CELL=default
JAVA_HOME=/etc/alternatives/jre (1.6.0_17)
DB_URL=jdbc:mysql://localhost:3306/sge_arco
DB_USER=arco_write
READ_USER=arco_read
INTERVAL=120
SPOOL_DIR=/mnt/zfs/sge/gridware/sge/default/spool/dbwriter
DERIVED_FILE=/mnt/zfs/sge/gridware/sge/dbwriter/database/mysql/dbwriter.xml
DEBUG_LEVEL=FINE
Are these settings correct? (y/n) [y] >>
Please note the
MySQL sge_arco Tables and Views creation phase:
Update version table
commiting changes
Version 6.1u3 (id=6) successfully installed
Install version 6.1u4 (id=7) -------
Create table sge_version
Insert first value in the checkpoint table
Update version table
commiting changes
Version 6.1u4 (id=7) successfully installed
Install version 6.2 (id=8) -------
Drop primary key constraint on sge_version table
Create compound primary key for sge_version
Create table sge_ar
Create index sge_ar_idx0 on column ar_number
Create index sge_ar_idx1 on column ar_owner
Create table sge_ar_attribute
Create index sge_ar_attribute_idx0 on column ara_end_time
Create table sge_ar_usage
Create table sge_ar_log
Create index sge_ar_log_idx0 on column arl_event
Create table sge_ar_resource_usage
Add the column ju_ar_parent to sge_job_usage table
Create index sge_job_usage_idx2 on column ju_ar_parent
Drop view view_job_times
Drop view view_accounting
Drop view view_jobs_completed
Update view view_accounting
Create view view_job_times_subquery
Update view view_job_times
Update view view_jobs_completed
Create view view_ar_attribute
Create view view_ar_log
Create view view_ar_usage
Create view view_ar_resource_usage
Create view view_ar_time_usage
Drop the column ju_state from sge_job_usage table
Drop the column j_open from sge_job table
Updating derived host values variable h_jobs to h_jobs_finished
Update version table
commiting changes
Version 6.2 (id=8) successfully installed
Install version 6.1u6 (id=9) -------
Extend too small integer field sge_department_values.dv_id,
drop temporarily constraint for foreign key sge_department_values.dv_parent
and extend too small integer field sge_department_values.dv_parent
Extend too small integer field sge_department.d_id
Recreate foreign key sge_department_values.dv_parent
Extend too small integer field sge_group_values.gv_id,
drop temporarily constraint for foreign key sge_group_values.gv_parent
and extend too small integer field sge_group_values.gv_parent
Extend too small integer
field sge_group.g_id
Recreate foreign key sge_group_values.gv_parent
Extend too small integer field sge_host_values.hv_id,
drop temporarily constraint for foreign key sge_host_values.hv_parent
and extend too small integer field sge_host_values.hv_parent
Extend too small integer field sge_host.h_id
Recreate foreign key sge_host_values.hv_parent
Extend too small integer field sge_job_log.jl_id,
drop temporarily constraint for foreign key sge_job_log.jl_parent
and extend too small integer field sge_job_log.jl_parent
Extend too small integer field sge_job_request.jr_id,
drop temporarily constraint for foreign key sge_job_request.jr_parent
and extend too small integer field sge_job_request.jr_parent
Extend too small integer field sge_job_usage.ju_id,
drop temporarily constraint for foreign key sge_job_usage.ju_parent
and extend too small integer field sge_job_usage.ju_parent
Extend too small integer field sge_job.j_id
Recreate foreign key sge_job_log.jl_parent
Recreate foreign key sge_job_request.jr_parent
Recreate foreign key sge_job_usage.ju_parent
Extend too small integer field sge_project_values.pv_id,
drop temporarily constraint for foreign key sge_project_values.pv_parent
and extend too small integer field sge_project_values.pv_parent
Extend too small integer field sge_project.p_id
Recreate foreign key sge_project_values.pv_parent
Extend too small integer field sge_queue_values.qv_id,
drop temporarily constraint for foreign key sge_queue_values.qv_parent
and extend too small integer field sge_queue_values.qv_parent
Extend too small integer field sge_queue.q_id
Recreate foreign key sge_queue_values.qv_parent
Extend too small integer field sge_share_log.sl_id
Extend too small integer field sge_statistic_values.sv_id,
drop temporarily constraint for foreign key sge_statistic_values.sv_parent
and extend too small integer field sge_statistic_values.sv_parent
Extend too small integer field sge_statistic.s_id
Recreate foreign key sge_statistic_values.sv_parent
Extend too small integer field sge_user_values.uv_id,
drop temporarily constraint for foreign key sge_user_values.uv_parent
and extend too small integer field sge_user_values.uv_parent
Extend too small integer field sge_user.u_id
Recreate foreign key sge_user_values.uv_parent
Update version table
commiting changes
Version 6.1u6 (id=9) successfully installed
Install version 6.2u1 (id=10) -------
Extend too small integer field sge_ar_attribute.ara_id,
drop temporarily constraint for foreign key sge_ar_attribute.ara_parent
and extend too small integer field sge_ar_attribute.ara_parent
Extend too small integer field sge_ar_log.arl_id,
drop temporarily constraint for foreign key sge_ar_log.arl_parent
and extend too small integer field sge_ar_log.arl_parent
Extend too small integer field sge_ar_resource_usage.arru_id,
drop temporarily constraint for foreign key sge_ar_resource_usage.arru_parent
and extend too small integer field sge_ar_resource_usage.arru_parent
Extend too small integer field sge_ar_usage.aru_id,
drop temporarily constraint for foreign key sge_ar_usage.aru_parent
and extend too small integer field sge_ar_usage.aru_parent
Extend too small integer field sge_ar.ar_id
Extend too small integer field sge_job_usage.ju_parent and sge_job_usage.ju_ar_parent
Recreate foreign key sge_ar_attribute.ara_parent
Recreate foreign key sge_ar_log.arl_parent
Recreate foreign key sge_ar_resource_usage.arru_parent
Recreate foreign key sge_ar_usage.aru_parent
Drop primary key constraint on sge_version table
Create compound primary key for sge_version
Update version table
commiting changes
Version 6.2u1 (id=10) successfully installed
OK
Create start script sgedbwriter in /mnt/zfs/sge/gridware/sge/default/common
Create configuration file for dbwriter in /mnt/zfs/sge/gridware/sge/default/common
Hit to continue >>
When the dbwriter installation is completed we got:
dbwriter startup script
-----------------------
We can install the startup script that will
start dbwriter at machine boot (y/n) [y] >>
cp /mnt/zfs/sge/gridware/sge/default/common/sgedbwriter /etc/init.d/sgedbwriter.p6444
/usr/lib/lsb/install_initd /etc/init.d/sgedbwriter.p6444
Creating dbwriter spool directory /mnt/zfs/sge/gridware/sge/default/spool/dbwriter
starting dbwriter
dbwriter started (pid=11098)
Installation of dbwriter completed
[root@t3ce02 dbwriter]#
Checking dbwriter logs
The program dbwriter is now a service in your system, you can start/stop it with:
/etc/init.d/sgedbwriter.p6444
And double checking what's going on with a tail command on these 2 log files:
[root@t3ce02 ~]# tail -f /mnt/zfs/sge/gridware/sge/default/spool/dbwriter/dbwriter.log
06/03/2011 16:24:51|t3ce02.psi.ch|ivedValueThread.commitExecuted|D|new object received, timestampOfLastRowData is 1,299,428,609,000
06/03/2011 16:24:51|t3ce02.psi.ch|iter.file.FileParser.parseFile|I|Deleting file reporting.processing
06/03/2011 16:24:51|t3ce02.psi.ch|.RecordCache.getStoredDBRecord|D|Object for key 'dbwriter' = [sge_statistic, id=1, parent=0, key=['dbwriter'], addr=0x7f712b3a]
06/03/2011 16:24:51|t3ce02.psi.ch|le.FileParser.createStatistics|I|Processed 6 lines in 0s (1500 lines/s)
06/03/2011 16:24:51|t3ce02.psi.ch|ter.RecordManager.executeBatch|D|Batch success. Number of statements executed: 0 table: 'sge_host_values'
06/03/2011 16:24:51|t3ce02.psi.ch|ter.RecordManager.executeBatch|D|Batch success. Number of statements executed: 1 table: 'sge_statistic_values'
06/03/2011 16:24:51|t3ce02.psi.ch|r.Controller.flushBatchesAtEnd|D|All Batches flushed and commited
06/03/2011 16:24:51|t3ce02.psi.ch|ng.dbwriter.db.Database.commit|D|Thread dbwriter commits Connection 3 (null@jdbc:mysql://localhost:3306/sge_arco)
06/03/2011 16:24:51|t3ce02.psi.ch|g.dbwriter.db.Database.release|D|Thread dbwriter releases Connection 3 (null@jdbc:mysql://localhost:3306/sge_arco)
06/03/2011 16:24:51|t3ce02.psi.ch|ter.ReportingDBWriter.mainLoop|C|Sleeping for 119,992 milli seconds
[root@t3ce02 ~]# tail -f /mnt/zfs/mysql/mysql/general.log
3 Query INSERT INTO sge_host_values (hv_id, hv_parent, hv_time_start, hv_time_end, hv_variable, hv_svalue, hv_dvalue, hv_dconfig) VALUES (97031, 37, '2011-03-06 16:25:29', '2011-03-06 16:25:29', 'mem_free', '1805.183594M', 1.892872192262144E+9, 0.0)
3 Query INSERT INTO sge_host_values (hv_id, hv_parent, hv_time_start, hv_time_end, hv_variable, hv_svalue, hv_dvalue, hv_dconfig) VALUES (97032, 37, '2011-03-06 16:25:29', '2011-03-06 16:25:29', 'virtual_free', '4127.070312M', 4.327546879475712E+9, 0.0)
3 Query INSERT INTO sge_host_values (hv_id, hv_parent, hv_time_start, hv_time_end, hv_variable, hv_svalue, hv_dvalue, hv_dconfig) VALUES (97033, 37, '2011-03-06 16:26:09', '2011-03-06 16:26:09', 'cpu', '0.000000', 0.0, 0.0)
3 Query INSERT INTO sge_host_values (hv_id, hv_parent, hv_time_start, hv_time_end, hv_variable, hv_svalue, hv_dvalue, hv_dconfig) VALUES (97034, 37, '2011-03-06 16:26:09', '2011-03-06 16:26:09', 'np_load_avg', '0.000000', 0.0, 0.0)
3 Query INSERT INTO sge_host_values (hv_id, hv_parent, hv_time_start, hv_time_end, hv_variable, hv_svalue, hv_dvalue, hv_dconfig) VALUES (97035, 37, '2011-03-06 16:26:09', '2011-03-06 16:26:09', 'mem_free', '1805.183594M', 1.892872192262144E+9, 0.0)
3 Query INSERT INTO sge_host_values (hv_id, hv_parent, hv_time_start, hv_time_end, hv_variable, hv_svalue, hv_dvalue, hv_dconfig) VALUES (97036, 37, '2011-03-06 16:26:09', '2011-03-06 16:26:09', 'virtual_free', '4127.070312M', 4.327546879475712E+9, 0.0)
3 Query UPDATE sge_checkpoint SET ch_line = 0, ch_time = '2011-03-06 16:26:51' WHERE ch_id=1
3 Query commit
3 Query INSERT INTO sge_statistic_values (sv_id, sv_parent, sv_time_start, sv_time_end, sv_variable, sv_dvalue) VALUES (2357, 1, '2011-03-06 16:26:51', '2011-03-06 16:26:51', 'lines_per_second', 1166.6666666666667)
3 Query commit
SGE ARCO
Now it's time to install the reporting layer, please have a look to the
Official ARCO documentation
Here follows our installation experience:
MySQL JDBC driver
ARCO it's Java application that needs to communicate with
MySQL, so we created an other symbolic link like in the dbwriter case:
[root@t3ce02 ~]# ll /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib/mysql-connector-java.jar
lrwxrwxrwx 1 root root 40 Mar 3 20:30 /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib/mysql-connector-java.jar -> /usr/share/java/mysql-connector-java.jar
link that was properly recognized by the ARCO installation procedure:
...
Searching for the jdbc driver com.mysql.jdbc.Driver
in directory /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib
OK, jdbc driver found
Should the connection to the database be tested? (y/n) [y] >>
Test database connection to 'jdbc:mysql://localhost:3306/sge_arco' ... OK
Hit to continue >>
DB parameters are now collected
-------------------------------
CLUSTER_NAME=T3_PSI_CH
DB_URL=jdbc:mysql://localhost:3306/sge_arco
DB_USER=arco_read
Are these settings correct? (y/n) [y] >>
Do you want to add another cluster? (y/n) [n] >>n
Configure users with write access
---------------------------------
Users: default
Enter a user login name. (Hit to finish) >> root
Users: default root
Enter a user login name. (Hit to finish) >> martinelli_f
Users: default root martinelli_f
Enter a user login name. (Hit to finish) >>
All parameters are now collected
--------------------------------
SPOOL_DIR=/var/spool/arco
APPL_USERS=default root martinelli_f
Are these settings correct? (y/n) [y] >>
found incorrect permissions lrwxrwxrwx for /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib/mysql-connector-java.jar
Correcting file permissions ... done
Standard ARCO Queries
SGE Engineers designed some standard queries useful for any kind of SGE cluster:
....
Install predefined queries
--------------------------
Directory /var/spool/arco does not exist, create it? (y/n) [y] >> y
Create directory /var/spool/arco
Create directory /var/spool/arco/queries
Copy examples queries into /var/spool/arco/queries
Copy query Accounting_per_AR.xml ... OK
Copy query Accounting_per_Department.xml ... OK
Copy query Accounting_per_Project.xml ... OK
Copy query Accounting_per_User.xml ... OK
Copy query AR_Attributes.xml ... OK
Copy query AR_by_User.xml ... OK
Copy query AR_Log.xml ... OK
Copy query AR_Reserved_Time_Usage.xml ... OK
Copy query Average_Job_Turnaround_Time.xml ... OK
Copy query Average_Job_Wait_Time.xml ... OK
Copy query DBWriter_Performance.xml ... OK
Copy query Host_Load.xml ... OK
Copy query Job_Log.xml ... OK
Copy query Number_of_Jobs_Completed_per_AR.xml ... OK
Copy query Number_of_Jobs_completed.xml ... OK
Copy query Queue_Consumables.xml ... OK
Copy query Statistic_History.xml ... OK
Copy query Statistics.xml ... OK
Copy query Wallclock_time.xml ... OK
Create directory /var/spool/arco/results
Hit to continue >>
ARCo reporting module setup
---------------------------
Copying ARCo reporting file into /mnt/zfs/sge/gridware/sge/default/arco/reporting
Setting up ARCo reporting configuration file. After registration of
the ARCo reporting module in the Sun Java Web Console you can find
this file at
/mnt/zfs/sge/gridware/sge/default/arco/reporting/config.xml
Hit to continue >>
Importing Sun Java Web Console 3.0 or 3.1 files
-----------------------------------------------
Imported files to /mnt/zfs/sge/gridware/sge/default/arco/reporting
Created product images in /mnt/zfs/sge/gridware/sge/default/arco/reporting/com_sun_web_ui/images
Hit to continue >>
Registering the SGE reporting module in the Sun Java Web Console
----------------------------------------------------------------
The reporting web application has been successfully deployed.
Set 1 properties for the com.sun.grid.arco_6.2u5 application.
Set 1 properties for the com.sun.grid.arco_6.2u5 application.
Set 1 properties for the com.sun.grid.arco_6.2u5 application.
Creating the TOC file ... OK
Restarting Sun Java Web Console
-------------------------------
Shutting down Sun Java(TM) Web Console Version 3.0.2 ...
Starting Sun Java(TM) Web Console Version 3.0.2 ...
The console is running.
SGE ARCo reporting successfully installed
Eventually the ARCO web access
At the end of the installation script we were able to access into ARCO;
https://t3ce02.psi.ch:6789/
SGE, importing an previous reporting file
It's possible to ingest an previous reporting file coming from an other SGE installation; because in our old cluster we had one we ingested > 1.5 year of statistics in this way:
[root@t3ce02 common]# ll /root/reporting
-rw-r--r--. 1 root root 740664403 Feb 28 23:08 /root/reporting
[root@t3ce02 common]# pwd
/mnt/zfs/sge/gridware/sge/default/common
[root@t3ce02 common]# cp -p /root/reporting .
cp: overwrite `./reporting'? y
SGE, inspect tool
It's possible to graphically monitor several SGE clusters and their queues by using the
Java tool Inspect that we have installed in the following way:
[root@t3ce02 SGE6.2u5]# unzip sge62u5_inspect_rpm.zip
Archive: sge62u5_inspect_rpm.zip
inflating: sge6_2u5/sun-sge-inspect-6.2-5.noarch.rpm
[root@t3ce02 SGE6.2u5]# yum install sge6_2u5/sun-sge-inspect-6.2-5.noarch.rpm
Setting up Install Process
Examining sge6_2u5/sun-sge-inspect-6.2-5.noarch.rpm: sun-sge-inspect-6.2-5.noarch
Marking sge6_2u5/sun-sge-inspect-6.2-5.noarch.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package sun-sge-inspect.noarch 0:6.2-5 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================
Installing:
sun-sge-inspect noarch 6.2-5 /sun-sge-inspect-6.2-5.noarch 40 M
Transaction Summary
================================================================================================================================
Install 1 Package(s)
Upgrade 0 Package(s)
Total size: 40 M
Installed size: 40 M
Is this ok [y/N]: y
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : sun-sge-inspect-6.2-5.noarch 1/1
Installed:
sun-sge-inspect.noarch 0:6.2-5
Complete!
[root@t3ce02 SGE6.2u5]#
Install jdk-develop by using yum:
...
================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================
Installing:
java-1.6.0-openjdk-devel x86_64 1:1.6.0.0-1.39.b17.el6_0 sl-security 8.5 M
Transaction Summary
================================================================================================================================
...
You need to create users and keys ( not really clear why.. ):
[root@t3ce02 bin]# cat /opt/SGE6.2u5/myusers.txt
root:iamroot:fabio.martinelli@psi.ch
[root@t3ce02 bin]#
[root@t3ce02 bin]# /mnt/zfs/sge/gridware/sge/util/sgeCA/sge_ca -usercert /opt/SGE6.2u5/myusers.txt
Generating user certificate and key for 'root' ('iamroot','fabio.martinelli@psi.ch').
Creating 'user' certificate and key for iamroot
-----------------------------------------------
Generating a 1024 bit RSA private key
......++++++
...++++++
writing new private key to '/var/sgeCA/port6444/default/userkeys/root/key.pem'
-----
Using configuration from /tmp/sge_ca115195.tmp
Check that the request matches the signature
Signature ok
The Subject's Distinguished Name is as follows
countryName :PRINTABLE:'DE'
stateOrProvinceName :PRINTABLE:'GERMANY'
localityName :PRINTABLE:'Building'
organizationName :PRINTABLE:'Organisation'
organizationalUnitName:T61STRING:'Organisation_unit'
userId :PRINTABLE:'root'
commonName :PRINTABLE:'iamroot'
emailAddress :IA5STRING:'fabio.martinelli@psi.ch'
Certificate is to be certified until Mar 2 22:35:05 2012 GMT (365 days)
Write out database with 1 new entries
Data Base Updated
created and signed certificate for user 'root' in '/var/sgeCA/port6444/default/userkeys/root'
[root@t3ce02 bin]#
Create and use passwords:
[root@t3ce02 bin]# /mnt/zfs/sge/gridware/sge/util/sgeCA/sge_ca -userks -kspwf /tmp/mysecret.txt
We made a script to setup JAVA_HOME and run inspect, please look:
[root@t3ce02 ~]# ll /usr/local/bin/sgeinspect.sh
lrwxrwxrwx 1 root root 42 Mar 3 23:51 /usr/local/bin/sgeinspect.sh -> /gridware/sge/sgeinspect/bin/sgeinspect.sh
[root@t3ce02 ~]# cat /usr/local/bin/sgeinspect.sh
export JAVA_HOME=/etc/alternatives/java_sdk
cd /gridware/sge/sgeinspect/bin
./sgeinspect
cd -
[root@t3ce02 ~]#
SGE EXECD 6.2u5 installation
Install the SGE execution side was more easy than the qmaster one but it requires some steps to follow well described in the official SGE
How to Install Execution Hosts: we did an installation without any NFS dependency, this should avoid global job crashes during an NFS server unreachable event and it improves I/O performances.
--
FabioMartinelli - 2011-03-03