How to manage jobs with SGE Utilities
SGE provides many command line utilities and a GUI program to Interact With the Sun Grid Engine Software.
For the information on how to submit general or CMSSW jobs, please consult
this page
Client Commands
qstat - Show job/queue status
- no arguments Show your currently running/pending jobs
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1261 0.55500 chen_z_cra chen_z qw 10/27/2008 10:17:41 1
1262 0.55500 chen_z_cra chen_z qw 10/27/2008 10:17:41 1
- -f Show full listing of all queues and related information.
[chen_z@t3ui01 ~]$ qstat -f
queuename qtype used/tot. load_avg arch states
----------------------------------------------------------------------------
all.q@t3wn01 BIP 8/8 8.00 lx24-amd64
----------------------------------------------------------------------------
all.q@t3wn02 BIP 8/8 8.05 lx24-amd64
----------------------------------------------------------------------------
all.q@t3wn03 BIP 8/8 8.00 lx24-amd64
----------------------------------------------------------------------------
all.q@t3wn04 BIP 8/8 8.00 lx24-amd64
----------------------------------------------------------------------------
all.q@t3wn05 BIP 8/8 8.00 lx24-amd64
----------------------------------------------------------------------------
all.q@t3wn06 BIP 8/8 7.87 lx24-amd64
----------------------------------------------------------------------------
all.q@t3wn07 BIP 8/8 8.00 lx24-amd64 d
----------------------------------------------------------------------------
all.q@t3wn08.psi.ch BIP 0/8 -NA- lx24-amd64 au
############################################################################
- PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
1261 0.55500 chen_z_cra chen_z qw 10/27/2008 10:17:41 1
1262 0.55500 chen_z_cra chen_z qw 10/27/2008 10:17:41 1
- -j Shows detailed information on pending/running job
[chen_z@t3ui01 ~]$ qstat -j
scheduling info: queue instance "all.q@t3wn08.psi.ch" dropped because it is temporarily not available
queue instance "all.q@t3wn07" dropped because it is disabled
queue instance "all.q@t3wn05" dropped because it is full
queue instance "all.q@t3wn01" dropped because it is full
queue instance "all.q@t3wn06" dropped because it is full
queue instance "all.q@t3wn03" dropped because it is full
queue instance "all.q@t3wn02" dropped because it is full
queue instance "all.q@t3wn04" dropped because it is full
All queues dropped because of overload or full
- -u USERNAME Shows jobs by user
qhost - Show job/host status
- no arguments Show a table of all execution hosts and information about their configuration
[chen_z@t3ui01 ~]$ qhost
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
t3wn01 lx24-amd64 8 8.00 15.7G 10.2G 1.9G 208.0K
t3wn02 lx24-amd64 8 7.97 15.7G 6.8G 1.9G 208.0K
t3wn03 lx24-amd64 8 8.00 15.7G 8.3G 1.9G 208.0K
t3wn04 lx24-amd64 8 8.05 15.7G 7.0G 1.9G 208.0K
t3wn05 lx24-amd64 8 8.01 15.7G 10.7G 1.9G 208.0K
t3wn06 lx24-amd64 8 8.25 15.7G 7.9G 1.9G 4.3M
t3wn07 lx24-amd64 8 8.00 15.7G 9.0G 1.9G 208.0K
t3wn08 lx24-amd64 8 - 15.7G - 1.9G -
- -j Shows detailed information on pending/running job by worker nodes
[chen_z@t3ui01 ~]$ qhost -j
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
t3wn01 lx24-amd64 8 8.00 15.7G 10.2G 1.9G 208.0K
job-ID prior name user state submit/start at queue master ja-task-ID
----------------------------------------------------------------------------------------------
1218 0.55500 sd_backgro dambach r 10/27/2008 19:23:01 all.q@t3wn MASTER
1220 0.55500 sd_backgro dambach r 10/27/2008 19:37:19 all.q@t3wn MASTER
1222 0.55500 sd_backgro dambach r 10/27/2008 19:44:44 all.q@t3wn MASTER
1224 0.55500 sd_backgro dambach r 10/27/2008 20:11:16 all.q@t3wn MASTER
1225 0.55500 sd_backgro dambach r 10/27/2008 20:16:09 all.q@t3wn MASTER
1227 0.55500 sd_backgro dambach r 10/27/2008 20:24:35 all.q@t3wn MASTER
1229 0.55500 sd_backgro dambach r 10/27/2008 20:30:56 all.q@t3wn MASTER
1230 0.55500 sd_backgro dambach r 10/27/2008 20:32:12 all.q@t3wn MASTER
t3wn02 lx24-amd64 8 7.91 15.7G 7.0G 1.9G 208.0K
1177 0.55500 sd_backgro dambach r 10/27/2008 08:55:35 all.q@t3wn MASTER
1201 0.55500 sd_backgro dambach r 10/27/2008 08:55:35 all.q@t3wn MASTER
1237 0.55500 sd_backgro dambach r 10/27/2008 21:13:36 all.q@t3wn MASTER
... ...
... ...
t3wn07 lx24-amd64 8 8.02 15.7G 9.1G 1.9G 208.0K
1221 0.55500 sd_backgro dambach r 10/27/2008 19:40:54 all.q@t3wn MASTER
1226 0.55500 sd_backgro dambach r 10/27/2008 20:20:15 all.q@t3wn MASTER
1228 0.55500 sd_backgro dambach r 10/27/2008 20:29:10 all.q@t3wn MASTER
1231 0.55500 sd_backgro dambach r 10/27/2008 20:40:06 all.q@t3wn MASTER
1233 0.55500 sd_backgro dambach r 10/27/2008 20:58:59 all.q@t3wn MASTER
1236 0.55500 sd_backgro dambach r 10/27/2008 21:09:43 all.q@t3wn MASTER
1239 0.55500 sd_backgro dambach r 10/27/2008 21:27:22 all.q@t3wn MASTER
1243 0.55500 sd_backgro dambach r 10/27/2008 21:32:47 all.q@t3wn MASTER
t3wn08 lx24-amd64 8 - 15.7G - 1.9G -
- -q Shows detailed information on queues at each host
Why Won't My Job Run Correctly?
Does your job show "Eqw" or "qw" state when you run qstat, and just sits there refusing to run? Get more info on what's wrong with it using:
qstat -j JOB_ID
This command prints the reason (scheduler information) why your job just sits in the queue, For example:
[chen_z@t3ui01 sge]$ qstat -j 1264
==============================================================
job_number: 1264
exec_file: job_scripts/1264
... ...
... ...
script_file: test.job
scheduling info: queue instance "all.q@t3wn08.psi.ch" dropped because it is temporarily not available
queue instance "all.q@t3wn07" dropped because it is disabled
queue instance "all.q@t3wn05" dropped because it is full
queue instance "all.q@t3wn01" dropped because it is full
queue instance "all.q@t3wn03" dropped because it is full
queue instance "all.q@t3wn04" dropped because it is full
(-l h_rt=460000) cannot run in queue "all.q@t3wn06" because it offers only qf:h_rt=4:00:30:00
(-l h_rt=460000) cannot run in queue "all.q@t3wn02" because it offers only qf:h_rt=4:00:30:00
So in this example, the reason is the maximum run time is larger than the run time limitation of the queue.
QMON, the Grid Engine System's Graphical User Interface
qmon is an X Windows Motif utility. You can use QMON to accomplish most SGE tasks. These tasks include submitting jobs, controlling jobs, and gathering important information. The QMON Main Control window is often the starting point for user and administrator functions. Each icon on the Main Control window is a GUI button that you click to start a variety of tasks. To see a button's name, rest the pointer over the button. The button name describes the button function. The following figure shows the QMON Main Control window along with descriptions of each icon.
--
ZhilingChen - 27 Oct 2008