How to manage jobs with SGE Utilities

SGE provides many command line utilities and a GUI program to Interact With the Sun Grid Engine Software.

For the information on how to submit jobs, please consult this page

Command Line Client Commands

qstat - show job/queue status

  • with no arguments, the command shows your currently running/pending jobs
    qstat
    
    job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
    -----------------------------------------------------------------------------------------------------------------
       1261 0.55500 chen_z_cra chen_z       qw    10/27/2008 10:17:41                                    1        
       1262 0.55500 chen_z_cra chen_z       qw    10/27/2008 10:17:41                                    1    
        
  • the -u flag can be used to look at other users jobs. You can use the wildcard '*' to specify all users
qstat -u '*'

  • the -f flag shows a full listing of all queues and related information.
    [chen_z@t3ui01 ~]$ qstat -f
    queuename                      qtype used/tot. load_avg arch          states
    ----------------------------------------------------------------------------
    all.q@t3wn01                   BIP   8/8       8.00     lx24-amd64    
    ----------------------------------------------------------------------------
    all.q@t3wn02                   BIP   8/8       8.05     lx24-amd64    
    ----------------------------------------------------------------------------
    all.q@t3wn03                   BIP   8/8       8.00     lx24-amd64    
    ----------------------------------------------------------------------------
    all.q@t3wn04                   BIP   8/8       8.00     lx24-amd64    
    ----------------------------------------------------------------------------
    all.q@t3wn05                   BIP   8/8       8.00     lx24-amd64    
    ----------------------------------------------------------------------------
    all.q@t3wn06                   BIP   8/8       7.87     lx24-amd64    
    ----------------------------------------------------------------------------
    all.q@t3wn07                   BIP   8/8       8.00     lx24-amd64    d
    ----------------------------------------------------------------------------
    all.q@t3wn08.psi.ch            BIP   0/8       -NA-     lx24-amd64    au
    
    ############################################################################
     - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
    ############################################################################
       1261 0.55500 chen_z_cra chen_z       qw    10/27/2008 10:17:41     1        
       1262 0.55500 chen_z_cra chen_z       qw    10/27/2008 10:17:41     1        
       

  • -j shows detailed information on pending/running job
         [chen_z@t3ui01 ~]$ qstat -j
         scheduling info:            queue instance "all.q@t3wn08.psi.ch" dropped because it is temporarily not available
                                queue instance "all.q@t3wn07" dropped because it is disabled
                                queue instance "all.q@t3wn05" dropped because it is full
                                queue instance "all.q@t3wn01" dropped because it is full
                                queue instance "all.q@t3wn06" dropped because it is full
                                queue instance "all.q@t3wn03" dropped because it is full
                                queue instance "all.q@t3wn02" dropped because it is full
                                queue instance "all.q@t3wn04" dropped because it is full
                                All queues dropped because of overload or full
        

qdel - delete a job from the queue

Syntax:
qdel $JOB-ID
Use the qstat command to find a job's ID.

qhost - Show job/host status

  • no arguments Show a table of all execution hosts and information about their configuration
    [chen_z@t3ui01 ~]$ qhost
    HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
    -------------------------------------------------------------------------------
    global                  -               -     -       -       -       -       -
    t3wn01                  lx24-amd64      8  8.00   15.7G   10.2G    1.9G  208.0K
    t3wn02                  lx24-amd64      8  7.97   15.7G    6.8G    1.9G  208.0K
    t3wn03                  lx24-amd64      8  8.00   15.7G    8.3G    1.9G  208.0K
    t3wn04                  lx24-amd64      8  8.05   15.7G    7.0G    1.9G  208.0K
    t3wn05                  lx24-amd64      8  8.01   15.7G   10.7G    1.9G  208.0K
    t3wn06                  lx24-amd64      8  8.25   15.7G    7.9G    1.9G    4.3M
    t3wn07                  lx24-amd64      8  8.00   15.7G    9.0G    1.9G  208.0K
    t3wn08                  lx24-amd64      8     -   15.7G       -    1.9G       -
       
  • -j Shows detailed information on pending/running job by worker nodes
    [chen_z@t3ui01 ~]$ qhost -j
    HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
    -------------------------------------------------------------------------------
    global                  -               -     -       -       -       -       -
    t3wn01                  lx24-amd64      8  8.00   15.7G   10.2G    1.9G  208.0K
       job-ID  prior   name       user         state submit/start at     queue      master ja-task-ID 
       ----------------------------------------------------------------------------------------------
          1218 0.55500 sd_backgro dambach      r     10/27/2008 19:23:01 all.q@t3wn MASTER        
          1220 0.55500 sd_backgro dambach      r     10/27/2008 19:37:19 all.q@t3wn MASTER        
          1222 0.55500 sd_backgro dambach      r     10/27/2008 19:44:44 all.q@t3wn MASTER        
          1224 0.55500 sd_backgro dambach      r     10/27/2008 20:11:16 all.q@t3wn MASTER        
          1225 0.55500 sd_backgro dambach      r     10/27/2008 20:16:09 all.q@t3wn MASTER        
          1227 0.55500 sd_backgro dambach      r     10/27/2008 20:24:35 all.q@t3wn MASTER        
          1229 0.55500 sd_backgro dambach      r     10/27/2008 20:30:56 all.q@t3wn MASTER        
          1230 0.55500 sd_backgro dambach      r     10/27/2008 20:32:12 all.q@t3wn MASTER        
    t3wn02                  lx24-amd64      8  7.91   15.7G    7.0G    1.9G  208.0K
          1177 0.55500 sd_backgro dambach      r     10/27/2008 08:55:35 all.q@t3wn MASTER        
          1201 0.55500 sd_backgro dambach      r     10/27/2008 08:55:35 all.q@t3wn MASTER        
          1237 0.55500 sd_backgro dambach      r     10/27/2008 21:13:36 all.q@t3wn MASTER        
    ... ... 
    ... ...       
    t3wn07                  lx24-amd64      8  8.02   15.7G    9.1G    1.9G  208.0K
          1221 0.55500 sd_backgro dambach      r     10/27/2008 19:40:54 all.q@t3wn MASTER        
          1226 0.55500 sd_backgro dambach      r     10/27/2008 20:20:15 all.q@t3wn MASTER        
          1228 0.55500 sd_backgro dambach      r     10/27/2008 20:29:10 all.q@t3wn MASTER        
          1231 0.55500 sd_backgro dambach      r     10/27/2008 20:40:06 all.q@t3wn MASTER        
          1233 0.55500 sd_backgro dambach      r     10/27/2008 20:58:59 all.q@t3wn MASTER        
          1236 0.55500 sd_backgro dambach      r     10/27/2008 21:09:43 all.q@t3wn MASTER        
          1239 0.55500 sd_backgro dambach      r     10/27/2008 21:27:22 all.q@t3wn MASTER        
          1243 0.55500 sd_backgro dambach      r     10/27/2008 21:32:47 all.q@t3wn MASTER        
    t3wn08                  lx24-amd64      8     -   15.7G       -    1.9G       -
       
  • -q Shows detailed information on queues at each host

Why Won't My Job Run Correctly?

Does your job show "Eqw" or "qw" state when you run qstat, and just sits there refusing to run? Get more info on what's wrong with it using:
qstat -j JOB_ID 
This command prints the reason (scheduler information) why your job just sits in the queue, For example:
[chen_z@t3ui01 sge]$ qstat -j 1264
==============================================================
job_number:                 1264
exec_file:                  job_scripts/1264
... ...
... ...
script_file:                test.job
scheduling info:            queue instance "all.q@t3wn08.psi.ch" dropped because it is temporarily not available
                            queue instance "all.q@t3wn07" dropped because it is disabled
                            queue instance "all.q@t3wn05" dropped because it is full
                            queue instance "all.q@t3wn01" dropped because it is full
                            queue instance "all.q@t3wn03" dropped because it is full
                            queue instance "all.q@t3wn04" dropped because it is full
                            (-l h_rt=460000) cannot run in queue "all.q@t3wn06" because it offers only qf:h_rt=4:00:30:00
                            (-l h_rt=460000) cannot run in queue "all.q@t3wn02" because it offers only qf:h_rt=4:00:30:00

   
So in this example, the reason is the maximum run time is larger than the run time limitation of the queue.

QMON, the Grid Engine System's Graphical User Interface Work in progress, under construction

Note: This GUI is rather complicated and confusing. We advise to normally use the few command line commands from above.

qmon is an X Windows Motif utility. You can use QMON to accomplish most SGE tasks. These tasks include submitting jobs, controlling jobs, and gathering important information. The QMON Main Control window is often the starting point for user and administrator functions. Each icon on the Main Control window is a GUI button that you click to start a variety of tasks. To see a button's name, rest the pointer over the button. The button name describes the button function. The following figure shows the QMON Main Control window along with descriptions of each icon. main-control.png

Picture_1.png -- ZhilingChen - 27 Oct 2008

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Picture_1.png r1 manage 66.4 K 2008-10-27 - 22:04 ZhilingChen QMON Job contol Window
PNGpng main-control.png r1 manage 50.3 K 2008-10-27 - 22:01 ZhilingChen QMON Main Control Window
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 2009-01-10 - DerekFeichtinger
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback