Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Changed: | ||||||||
< < |
SGE 6.2u5 and ARCO MySQL hosted on ZFSRevision 12, 2013-07-16 13:26:52 Sun Grid Engine project home page: http://gridengine.sunsource.net/![]() | |||||||
> > | 16th July 2013 - OUTDATED, BUT KEPT FOR REFERENCE | |||||||
This document describes the experiences we made during the upgrade of the SGE installation from 6.1 to 6.2u5, the last free version of this batch system; apart from the SGE upgrade itself that introduced several new features in the batch system we have also migrated the O.S., the method to manage accounting by introducing a DB and we have introduced the ZFS Linux driver to use this advanced filesystem in our context. | ||||||||
Deleted: | ||||||||
< < | HW installationFor our installation we detached t3ui07 from the cluster and we converted in t3ce02, that is the new SGE master in this document; to protect its data we made an HW RAID1 configuration by using the LSI Bios at boot time and the final disk layout is a 140GB LSI Virtual Volume that we have partitioned during the SL6 installation in this way:[root@t3ce02 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 9.7G 2.3G 6.9G 25% / tmpfs 7.8G 0 7.8G 0% /dev/shm /dev/sda1 485M 34M 426M 8% /boot [root@t3ce02 ~]# mount /dev/sda3 on / type ext4 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw,rootcontext="system_u:object_r:tmpfs_t:s0") /dev/sda1 on /boot type ext4 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)Because there are 4 Gigabit NICs in the server it's worth to connect to the switch as many NICs it's possible and later configure a Linux Bonding configuration type 6 ![]() SL6 64bit InstallationWe simply pointed the Virtual CD of t3ce02 to a SL6 DVD iso file we saved in t3admin01:/home/ and made a "Basic Server" installation, that's enough to install utilities like SSHs, yum, .. later we selected the other RPMs at run time. The "Basic Server" installation turns ON selinux by default, to disable it edit this file and then reboot the system:[root@t3ce02 ~]# grep -v \# /etc/sysconfig/selinux SELINUX=disabled SELINUXTYPE=targeted [root@t3ce02 ~]#also turn OFF cron yum updated editing this file: /etc/sysconfig/yum-autoupdateAnd install these i686 RPMs because later they are needed by the Sun Web Console and by the LSI RAID utility mpt-status: [root@t3ce02 ~]# yum install glibc.i686 ... Dependencies Resolved ================================================================================================================================ Package Arch Version Repository Size ================================================================================================================================ Installing: glibc i686 2.12-1.7.el6_0.3 sl-security 4.3 M Installing for dependencies: nss-softokn-freebl i686 3.12.8-1.el6_0 sl-security 108 k Updating for dependencies: glibc x86_64 2.12-1.7.el6_0.3 sl-security 3.7 M glibc-common x86_64 2.12-1.7.el6_0.3 sl-security 14 M nss-softokn-freebl x86_64 3.12.8-1.el6_0 sl-security 114 k Transaction Summary ================================================================================================================================ Install 2 Package(s) Upgrade 3 Package(s) Total size: 22 M Total download size: 4.4 M Is this ok [y/N]: y Downloading Packages: (1/2): glibc-2.12-1.7.el6_0.3.i686.rpm | 4.3 MB 00:09 (2/2): nss-softokn-freebl-3.12.8-1.el6_0.i686.rpm | 108 kB 00:00 -------------------------------------------------------------------------------------------------------------------------------- ... Complete! [root@t3ce02 ~]#now you can install the LSI RAID checker "mpt-status" to monitor the HW RAID status: [root@t3ce02 ~]# rpm -Uv http://www.drugphish.ch/~ratz/mpt-status/RPMS/1.2.0_RC7/mpt-status-1.2.0_RC7-3.i386.rpm Retrieving http://www.drugphish.ch/~ratz/mpt-status/RPMS/1.2.0_RC7/mpt-status-1.2.0_RC7-3.i386.rpm Preparing packages for installation... mpt-status-1.2.0_RC7-3 [root@t3ce02 ~]#load the driver and verify the HW RAID1: [root@t3ce02 ~]# modprobe mptctl [root@t3ce02 ~]# mpt-status ioc0 vol_id 0 type IM, 2 phy, 135 GB, state OPTIMAL, flags ENABLED ioc0 phy 1 scsi_id 2 SEAGATE ST914602SSUN146G 0603, 136 GB, state ONLINE, flags NONE ioc0 phy 0 scsi_id 1 SEAGATE ST914602SSUN146G 0603, 136 GB, state ONLINE, flags NONE [root@t3ce02 ~]#curiously I couldn't find /etc/modprobe.conf, probablt in SL6 there is an other mechanism for that, so I just appended the command: [root@t3ce02 etc]# echo modprobe mptctl >> /etc/rc.localdid you reboot the system? if no let's do it now. | |||||||
ZFS on SL6 64bit. | ||||||||
Added: | ||||||||
> > | OUTDATED, now there is ZFS on Linux![]() | |||||||
We found interesting to run ZFS filesystems![]() ![]() | ||||||||
Line: 216 to 118 | ||||||||
to manage MySQL you can use several tools, probably the most common choice is to deploy mysql-workbench![]() | ||||||||
Deleted: | ||||||||
< < | MySQL PhPMyAdminWe liked and installed https://t3ce02.psi.ch/phpmyadmin/![]() FeaturesSomething worth to report about PhPMyAdmin, with version 3.4 you can:* browse and drop databases, tables, views, columns and indexes * create, copy, drop, rename and alter databases, tables, columns and indexes * maintenance server, databases and tables, with proposals on server configuration * execute, edit and bookmark any SQL-statement, even batch-queries * load text files into tables * create1 and read dumps of tables * export1 data to various formats: CSV, XML, PDF, ISO/IEC 26300 - OpenDocument Text and Spreadsheet, Word, Excel and LATEX formats * import data and MySQL structures from Microsoft Excel and OpenDocument spreadsheets, as well as XML, CSV, and SQL files * administer multiple servers * manage MySQL users and privileges * check referential integrity in MyISAM tables * using Query-by-example (QBE), create complex queries automatically connecting required tables * create PDF graphics of your Database layout * search globally in a database or a subset of it * transform stored data into any format using a set of predefined functions, like displaying BLOB-data as image or download-link * track changes on databases, tables and views * support InnoDB tables and foreign keys (see FAQ 3.6) * support mysqli, the improved MySQL extension (see FAQ 1.17) * communicate in 62 different languages * synchronize two databases residing on the same as well as remote servers (see FAQ 9.1) Dedicated READONLY ZFS filesystemWith ZFS you can create as many filesystems you can store on you ZFS pool, so we created a ZFS filesystem for phpmyadmin and once we did the configuration we setted the fs READ ONLY feature to ON:[root@t3ce02 ~]# cd /var/www/html/ [root@t3ce02 html]# ll lrwxrwxrwx 1 root root 20 Mar 6 19:54 phpmyadmin -> /mnt/zfs/phpmyadmin/ [root@t3ce02 html]# df -h /mnt/zfs/phpmyadmin/ Filesystem Size Used Avail Use% Mounted on zfspool/phpmyadmin 53G 18M 53G 1% /mnt/zfs/phpmyadmin [root@t3ce02 html]# zfs get readonly zfspool/phpmyadmin NAME PROPERTY VALUE SOURCE zfspool/phpmyadmin readonly on local | |||||||
MySQL ARCO DBNow we can prepare the sge_arco DB and the related 2 MySQL users, user 'arco_read' that's used by the ARCO Web application to run queries and the user 'arco_write' that's used by the reporting module to parse the SGE reporting file /gridware/sge/default/common/reporting and insert new rows in the DB sge_arco. | ||||||||
Line: 303 to 160 | ||||||||
110303 17:48:13 1 Connect Access denied for user 'UNKNOWN_MYSQL_USER'@'localhost' (using password: NO) | ||||||||
Changed: | ||||||||
< < | Sun Web Console installation | |||||||
> > | Sun ARCO Web Console installation | |||||||
The first step to do to install SGE and SGE ARCO is to deploy the Sun Web Console![]() | ||||||||
Line: 563 to 420 | ||||||||
SGE configuration tuning | ||||||||
Changed: | ||||||||
< < | Now to enable the SGE reporting file and to save job logs on the server where the job ran we tuned the SGE conf with the 'qconf -mconf' command taking into account this fragment : | |||||||
> > | Now to enable the SGE reporting file and to save job logs on the server where the job ran we tuned the SGE conf with the qconf -mconf command taking into account this fragment : | |||||||
... execd_params KEEP_ACTIVE=1 ENABLE_ADDGRP_KILL=TRUE H_MEMORYLOCKED=infinity reporting_params accounting=true reporting=true flush_time=00:00:15 joblog=true sharelog=00:00:00 | ||||||||
Line: 593 to 450 | ||||||||
... | ||||||||
Changed: | ||||||||
< < | SGE dbwriter | |||||||
> > | sgedbwriter | |||||||
Changed: | ||||||||
< < | Once the SGE master is properly working we can install the dbwriter tool, that involves the MySQL user 'arco_write', and the ARCO reporting software, that involves the MySQL user 'arco_read' and the Sun Web Console. Please have a look to the official SGE documentation![]() | |||||||
> > | Once the SGE master will be properly working then we can install the sgedbwriter tool that involves the MySQL user arco_write , and the ARCO reporting software that involves the MySQL user arco_read + the Sun Web Console.
Please have a look to the official SGE documentation![]() | |||||||
reporting file duplication for security reasons | ||||||||
Changed: | ||||||||
< < | Once up and running dbwriter parses and eventually drops the SGE reporting file, even if it's the nominal behaviour of this SW we disagreed on collateral effect, so we've started this command to preserve the reporting file content inside an other file, the tail terminates when sge_qmaster terminates:
nohup tail --pid=$(pidof sge_qmaster) -n 0 -F /gridware/sge/default/common/reporting >> /gridware/sge/default/common/reporting.not.deleted.by.dbwriter & | |||||||
> > | Once up and running sgedbwriter will parse and eventually drops the SGE reporting file, even if it's the nominal behaviour of this SW we disagreed on its collateral effect ( we loose that file ), so we've started this command to save the reporting file into an other file, the tail terminates when sge_qmaster terminates, so hopefully never:
nohup tail --pid=$(pidof sge_qmaster) -n 0 -F /gridware/sge/default/common/reporting >> /gridware/sge/default/common/reporting.not.deleted.by.dbwriter & | |||||||
RPM installation |
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Changed: | ||||||||
< < | SGE 6.2u5 plus ARCO MySQL on SL6 64bit powered by ZFS | |||||||
> > | SGE 6.2u5 and ARCO MySQL hosted on ZFS | |||||||
Revision 12, 2013-07-16 13:26:52 | ||||||||
Line: 1180 to 1183 | ||||||||
Customize SGE ARCOOnce the installation was completed and the old SGE reporting file ingested in the MySQL we started to design some SQL queries in ARCO and produce graphs that's the most interesting part. So we produced: | ||||||||
Changed: | ||||||||
< < | 1 day CPU usage1 day MEM usage | |||||||
> > | 1 day CPU usage1 day MEM usage | |||||||
-- FabioMartinelli - 2011-03-03
| ||||||||
Line: 1244 to 1247 | ||||||||
| ||||||||
Added: | ||||||||
> > |
|
Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
> Package sun-sge-arco.noarch 0:6.2-5 set to be updated --> Finished Dependency Resolution | ||||||||
> > | --> Running transaction check
> Package sun-sge-arco.noarch 0:6.2-5 set to be updated --> Finished Dependency Resolution | |||||||
Dependencies Resolved | ||||||||
Line: 654 to 648 | ||||||||
MySQL JDBC driver | ||||||||
Added: | ||||||||
> > | ||||||||
Be sure that you have a MySQL JDBC driver file and link that file inside the SGE dir: | ||||||||
Changed: | ||||||||
< < | [root@t3ce02 sge6_2u5]# yum install mysql-connector-java.x86_64 | |||||||
> > | [root@t3ce02 sge6_2u5]# yum install mysql-connector-java.x86_64 | |||||||
... [root@t3ce02 lib]# pwd /mnt/zfs/sge/gridware/sge/dbwriter/lib | ||||||||
Line: 664 to 658 | ||||||||
[root@t3ce02 lib]# ln -s /usr/share/java/mysql-connector-java.jar
Installation /inst_dbwriter | ||||||||
Changed: | ||||||||
< < | During the dbwriter installation itself, that's well reported on the official SGE site, we were prompted for several things, one is which Java to use, there we specified '/etc/alternatives/jre/' to be protected by a System java update. So we ran:
cd $SGE_ROOT/dbwriter && /inst_dbwriter | |||||||
> > |
During the dbwriter installation itself, that's well reported inside the official SGE website, we were prompted for several things, one is which Java to use, there we specified '/etc/alternatives/jre/' to be protected from Java update. So we ran:
cd $SGE_ROOT/dbwriter && /inst_dbwriter | |||||||
...
All parameters are now collected
| ||||||||
Line: 682 to 676 | ||||||||
DERIVED_FILE=/mnt/zfs/sge/gridware/sge/dbwriter/database/mysql/dbwriter.xml DEBUG_LEVEL=FINE | ||||||||
Changed: | ||||||||
< < | Are these settings correct? (y/n) [y] >> | |||||||
> > | Are these settings correct? (y/n) [y] >> | |||||||
Please note the MySQL sge_arco Tables and Views creation phase: | ||||||||
Changed: | ||||||||
< < | Update version table | |||||||
> > | Update version table | |||||||
commiting changes Version 6.1u3 (id=6) successfully installed Install version 6.1u4 (id=7) ------- | ||||||||
Line: 814 to 807 | ||||||||
Create configuration file for dbwriter in /mnt/zfs/sge/gridware/sge/default/common | ||||||||
Changed: | ||||||||
< < | Hit | |||||||
> > | Hit to continue >> | |||||||
When the dbwriter installation is completed we got: | ||||||||
Changed: | ||||||||
< < | dbwriter startup script | |||||||
> > | dbwriter startup script | |||||||
We can install the startup script that will | ||||||||
Changed: | ||||||||
< < | start dbwriter at machine boot (y/n) [y] >> | |||||||
> > | start dbwriter at machine boot (y/n) [y] >> | |||||||
cp /mnt/zfs/sge/gridware/sge/default/common/sgedbwriter /etc/init.d/sgedbwriter.p6444 /usr/lib/lsb/install_initd /etc/init.d/sgedbwriter.p6444 | ||||||||
Line: 833 to 825 | ||||||||
Installation of dbwriter completed [root@t3ce02 dbwriter]# | ||||||||
Added: | ||||||||
> > | ||||||||
Checking dbwriter logs | ||||||||
Added: | ||||||||
> > | ||||||||
The program dbwriter is now a service in your system, you can start/stop it with: | ||||||||
Changed: | ||||||||
< < | /etc/init.d/sgedbwriter.p6444 | |||||||
> > | /etc/init.d/sgedbwriter.p6444 | |||||||
Added: | ||||||||
> > | ||||||||
And double checking what's going on with a tail command on these 2 log files: | ||||||||
Changed: | ||||||||
< < | [root@t3ce02 ~]# tail -f /mnt/zfs/sge/gridware/sge/default/spool/dbwriter/dbwriter.log | |||||||
> > | [root@t3ce02 ~]# tail -f /mnt/zfs/sge/gridware/sge/default/spool/dbwriter/dbwriter.log | |||||||
06/03/2011 16:24:51|t3ce02.psi.ch|ivedValueThread.commitExecuted|D|new object received, timestampOfLastRowData is 1,299,428,609,000 06/03/2011 16:24:51|t3ce02.psi.ch|iter.file.FileParser.parseFile|I|Deleting file reporting.processing 06/03/2011 16:24:51|t3ce02.psi.ch|.RecordCache.getStoredDBRecord|D|Object for key 'dbwriter' = [sge_statistic, id=1, parent=0, key=['dbwriter'], addr=0x7f712b3a] | ||||||||
Line: 866 to 859 | ||||||||
Added: | ||||||||
> > | dbwriter RE-importing a reporting fileIf for whatever reason you need to ingest again the reporting file inside the MySQL DB sge_arco then please run these sequence of truncates:TRUNCATE `sge_checkpoint`; TRUNCATE `sge_department`; TRUNCATE `sge_group`; TRUNCATE `sge_host`; TRUNCATE `sge_host_values`; TRUNCATE `sge_job`; TRUNCATE `sge_job_log`; TRUNCATE `sge_job_request`; TRUNCATE `sge_job_usage`; TRUNCATE `sge_project`; TRUNCATE `sge_project_values`; TRUNCATE `sge_queue`; TRUNCATE `sge_queue_values`; TRUNCATE `sge_statistic`; TRUNCATE `sge_statistic_values`; TRUNCATE `sge_user`; TRUNCATE `sge_user_values`; | |||||||
SGE ARCO | ||||||||
Added: | ||||||||
> > | ||||||||
Now it's time to install the reporting layer, please have a look to the Official ARCO documentation![]() | ||||||||
Line: 872 to 888 | ||||||||
Here follows our installation experience:
MySQL JDBC driver | ||||||||
Added: | ||||||||
> > | ||||||||
ARCO it's Java application that needs to communicate with MySQL, so we created an other symbolic link like in the dbwriter case: | ||||||||
Changed: | ||||||||
< < | [root@t3ce02 ~]# ll /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib/mysql-connector-java.jar lrwxrwxrwx 1 root root 40 Mar 3 20:30 /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib/mysql-connector-java.jar -> /usr/share/java/mysql-connector-java.jar | |||||||
> > | [root@t3ce02 ~]# ll /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib/mysql-connector-java.jar lrwxrwxrwx 1 root root 40 Mar 3 20:30 /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib/mysql-connector-java.jar -> /usr/share/java/mysql-connector-java.jar | |||||||
Added: | ||||||||
> > | ||||||||
link that was properly recognized by the ARCO installation procedure: | ||||||||
Changed: | ||||||||
< < | ... | |||||||
> > | ... | |||||||
Searching for the jdbc driver com.mysql.jdbc.Driver in directory /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib OK, jdbc driver found | ||||||||
Changed: | ||||||||
< < | Should the connection to the database be tested? (y/n) [y] >> | |||||||
> > | Should the connection to the database be tested? (y/n) [y] >> | |||||||
Test database connection to 'jdbc:mysql://localhost:3306/sge_arco' ... OK | ||||||||
Changed: | ||||||||
< < | Hit | |||||||
> > | Hit to continue >> | |||||||
DB parameters are now collected
| ||||||||
Line: 897 to 913 | ||||||||
DB_URL=jdbc:mysql://localhost:3306/sge_arco DB_USER=arco_read | ||||||||
Changed: | ||||||||
< < | Are these settings correct? (y/n) [y] >> | |||||||
> > | Are these settings correct? (y/n) [y] >> | |||||||
Changed: | ||||||||
< < | Do you want to add another cluster? (y/n) [n] >>n | |||||||
> > | Do you want to add another cluster? (y/n) [n] >>n | |||||||
Configure users with write access
Users: default | ||||||||
Changed: | ||||||||
< < | Enter a user login name. (Hit | |||||||
> > | Enter a user login name. (Hit to finish) >> root | |||||||
Users: default root | ||||||||
Changed: | ||||||||
< < | Enter a user login name. (Hit | |||||||
> > | Enter a user login name. (Hit to finish) >> martinelli_f | |||||||
Users: default root martinelli_f | ||||||||
Changed: | ||||||||
< < | Enter a user login name. (Hit | |||||||
> > | Enter a user login name. (Hit to finish) >> | |||||||
All parameters are now collected
SPOOL_DIR=/var/spool/arco APPL_USERS=default root martinelli_f | ||||||||
Changed: | ||||||||
< < | Are these settings correct? (y/n) [y] >> | |||||||
> > | Are these settings correct? (y/n) [y] >> | |||||||
found incorrect permissions lrwxrwxrwx for /mnt/zfs/sge/gridware/sge/reporting/WEB-INF/lib/mysql-connector-java.jar Correcting file permissions ... done | ||||||||
Line: 924 to 940 | ||||||||
Correcting file permissions ... done
Standard ARCO Queries | ||||||||
Added: | ||||||||
> > | ||||||||
SGE Engineers designed some standard queries useful for any kind of SGE cluster: | ||||||||
Changed: | ||||||||
< < | .... | |||||||
> > | .... | |||||||
Install predefined queries
| ||||||||
Changed: | ||||||||
< < | Directory /var/spool/arco does not exist, create it? (y/n) [y] >> y | |||||||
> > | Directory /var/spool/arco does not exist, create it? (y/n) [y] >> y | |||||||
Create directory /var/spool/arco Create directory /var/spool/arco/queries | ||||||||
Line: 956 to 972 | ||||||||
Copy query Wallclock_time.xml ... OK Create directory /var/spool/arco/results | ||||||||
Changed: | ||||||||
< < | Hit | |||||||
> > | Hit to continue >> | |||||||
ARCo reporting module setup
| ||||||||
Line: 968 to 984 | ||||||||
/mnt/zfs/sge/gridware/sge/default/arco/reporting/config.xml | ||||||||
Changed: | ||||||||
< < | Hit | |||||||
> > | Hit to continue >> | |||||||
Importing Sun Java Web Console 3.0 or 3.1 files
Imported files to /mnt/zfs/sge/gridware/sge/default/arco/reporting Created product images in /mnt/zfs/sge/gridware/sge/default/arco/reporting/com_sun_web_ui/images | ||||||||
Changed: | ||||||||
< < | Hit | |||||||
> > | Hit to continue >> | |||||||
Registering the SGE reporting module in the Sun Java Web Console
| ||||||||
Line: 994 to 1010 | ||||||||
Eventually the ARCO web access | ||||||||
Changed: | ||||||||
< < | At the end of the installation script we were able to access into ARCO;
https://t3ce02.psi.ch:6789/![]() | |||||||
> > |
At the end of the installation script we were able to access into ARCO; https://t3ce02.psi.ch:6789/![]() | |||||||
SGE, importing an previous reporting file | ||||||||
Changed: | ||||||||
< < | It's possible to ingest an previous reporting file coming from an other SGE installation; because in our old cluster we had one we ingested > 1.5 year of statistics in this way:
[root@t3ce02 common]# ll /root/reporting | |||||||
> > |
It's possible to ingest an previous reporting file coming from an other SGE installation; because in our old cluster we had one we ingested > 1.5 year of statistics in this way:
[root@t3ce02 common]# ll /root/reporting | |||||||
-rw-r--r--. 1 root root 740664403 Feb 28 23:08 /root/reporting [root@t3ce02 common]# pwd /mnt/zfs/sge/gridware/sge/default/common | ||||||||
Line: 1009 to 1025 | ||||||||
SGE, inspect tool | ||||||||
Added: | ||||||||
> > | ||||||||
It's possible to graphically monitor several SGE clusters and their queues by using the Java tool Inspect![]() | ||||||||
Changed: | ||||||||
< < | [root@t3ce02 SGE6.2u5]# unzip sge62u5_inspect_rpm.zip | |||||||
> > | [root@t3ce02 SGE6.2u5]# unzip sge62u5_inspect_rpm.zip | |||||||
Archive: sge62u5_inspect_rpm.zip inflating: sge6_2u5/sun-sge-inspect-6.2-5.noarch.rpm [root@t3ce02 SGE6.2u5]# yum install sge6_2u5/sun-sge-inspect-6.2-5.noarch.rpm | ||||||||
Line: 1019 to 1035 | ||||||||
Examining sge6_2u5/sun-sge-inspect-6.2-5.noarch.rpm: sun-sge-inspect-6.2-5.noarch Marking sge6_2u5/sun-sge-inspect-6.2-5.noarch.rpm to be installed Resolving Dependencies | ||||||||
Changed: | ||||||||
< < | --> Running transaction check
> Package sun-sge-inspect.noarch 0:6.2-5 set to be updated --> Finished Dependency Resolution | |||||||
> > | --> Running transaction check
> Package sun-sge-inspect.noarch 0:6.2-5 set to be updated --> Finished Dependency Resolution | |||||||
Dependencies Resolved | ||||||||
Line: 1054 to 1070 | ||||||||
Install jdk-develop by using yum: | ||||||||
Changed: | ||||||||
< < | ... | |||||||
> > | ... | |||||||
============================================================================================================================
Package Arch Version Repository Size
============================================================================================================================ | ||||||||
Line: 1068 to 1083 | ||||||||
You need to create users and keys ( not really clear why.. ): | ||||||||
Changed: | ||||||||
< < | [root@t3ce02 bin]# cat /opt/SGE6.2u5/myusers.txt | |||||||
> > | [root@t3ce02 bin]# cat /opt/SGE6.2u5/myusers.txt | |||||||
root:iamroot:fabio.martinelli@psi.ch [root@t3ce02 bin]# | ||||||||
Line: 1104 to 1118 | ||||||||
Create and use passwords: | ||||||||
Changed: | ||||||||
< < | [root@t3ce02 bin]# /mnt/zfs/sge/gridware/sge/util/sgeCA/sge_ca -userks -kspwf /tmp/mysecret.txt | |||||||
> > | [root@t3ce02 bin]# /mnt/zfs/sge/gridware/sge/util/sgeCA/sge_ca -userks -kspwf /tmp/mysecret.txt | |||||||
Added: | ||||||||
> > | ||||||||
We made a script to setup JAVA_HOME and run inspect, please look: | ||||||||
Changed: | ||||||||
< < | [root@t3ce02 ~]# ll /usr/local/bin/sgeinspect.sh lrwxrwxrwx 1 root root 42 Mar 3 23:51 /usr/local/bin/sgeinspect.sh -> /gridware/sge/sgeinspect/bin/sgeinspect.sh | |||||||
> > | [root@t3ce02 ~]# ll /usr/local/bin/sgeinspect.sh lrwxrwxrwx 1 root root 42 Mar 3 23:51 /usr/local/bin/sgeinspect.sh -> /gridware/sge/sgeinspect/bin/sgeinspect.sh | |||||||
[root@t3ce02 ~]# cat /usr/local/bin/sgeinspect.sh export JAVA_HOME=/etc/alternatives/java_sdk | ||||||||
Line: 1119 to 1132 | ||||||||
cd - [root@t3ce02 ~]# | ||||||||
Added: | ||||||||
> > | ||||||||
SGE EXECD 6.2u5 installation | ||||||||
Added: | ||||||||
> > | ||||||||
Install the SGE execution side was easier than to install the master one but it requires some steps to follow well described in the official SGE How to Install Execution Hosts![]() | ||||||||
Changed: | ||||||||
< < | rsync -av /gridware/sge/default/common EXECUTION_HOST:/gridware/sge/default/common | |||||||
> > | rsync -av /gridware/sge/default/common EXECUTION_HOST:/gridware/sge/default/common | |||||||
Added: | ||||||||
> > | ||||||||
so the ./install_execd receives the configuration files where is reported who is the master and parameters affecting the execution. we did an installation without any NFS dependency, this should avoid global job crashes during an NFS server unreachable event and it improves I/O performances. | ||||||||
Line: 1128 to 1143 | ||||||||
we did an installation without any NFS dependency, this should avoid global job crashes during an NFS server unreachable event and it improves I/O performances.
Dropping old job files | ||||||||
Added: | ||||||||
> > | ||||||||
It's a good policy to keep recent SGE computations in the WN to troubleshoot what went wrong during a job failure but after a month or two it doesn't make sense preserve those files and maybe you could collapse the directory: when I was in ESA I did this cron script to move out and later delete old SGE job dirs ( the script has to be adapted to your cluster ): | ||||||||
Changed: | ||||||||
< < | #!/bin/bash | |||||||
> > | #!/bin/bash | |||||||
# by martinelli @ ESA - 26/05/2010 # /var/sge/spool/$HOST/active_jobs dir was found with 31199 dirs inside and because that dir is hosted on an EXT3 filesystem it can't store max 32k dirs; | ||||||||
Line: 1148 to 1163 | ||||||||
# prepares move commands cd $ACTIVE_JOBS | ||||||||
Changed: | ||||||||
< < | /usr/bin/find . -mtime +15 -type d -exec echo mv '{}' /stage/active_jobs_old/ \; > /tmp/$BASENAME-mv.sh | |||||||
> > | /usr/bin/find . -mtime +15 -type d -exec echo mv '{}' /stage/active_jobs_old/ \; > /tmp/$BASENAME-mv.sh | |||||||
# executes move commands source /tmp/$BASENAME-mv.sh # later drops old files, prepare the rm commands cd /stage/active_jobs_old/ | ||||||||
Changed: | ||||||||
< < | /usr/bin/find . -mtime +45 -type d -exec echo rm -rf '{}' \; > /tmp/$BASENAME-rm.sh | |||||||
> > | /usr/bin/find . -mtime +45 -type d -exec echo rm -rf '{}' \; > /tmp/$BASENAME-rm.sh | |||||||
# execute them source /tmp/$BASENAME-rm.sh && exit 0 | ||||||||
Line: 1202 to 1216 | ||||||||
| ||||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
Changed: | ||||||||
< < |
| |||||||
> > |
| |||||||
|
Line: 1 to 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Revision 82011-05-09 - FabioMartinelli
|