Interacting with Sun Grid Engine (SGE)

Scope

Metworx manages jobs submitted to the compute grid via the Sun Grid Engine (SGE). This page demonstrates commands for interacting with SGE that can be executed via shell prompt or passed from R using the function system().

In this article you will see how to:

Submit a job to run using qsub
Monitor running jobs using qstat
Delete running jobs using qdel
Decrease the priority of pending jobs using qalter
Recognize when a job or queue is having an issue

For background on the Metworx compute grid and its structure, please refer to the Grid Computing Intro.

SGE and Slurm job schedulers

This page discusses using SGE as the job scheduler for interacting with the grid. Blueprints 24-04 and later have the option to use Slurm (instead of SGE) as the job scheduler. See "SGE-to-Slurm Quick Start Guide" for information on using Slurm.

qsub

qsub is used to submit a job to a queue. Passing the file path to an executable script to qsub will create a job in the queue for that script.

$ qsub my_script.sh

qsub has a number of important arguments, for example -pe orte NUMBER-OF-CORES for reserving multiple cores to run your job in parallel. Call man qsub for more details.

"Unable to run job" Warning Message

By default, a Metworx workflow starts with no compute nodes active, and it auto scales by launching compute nodes as jobs are submitted to the queue. If there are no compute nodes currently up when you call qsub, you will see the following message:

Unable to run job: warning: <your-user-name's> job is not allowed to run in any queue

Your job <number> ("<model-name>") has been submitted

Exiting.

This simply means that it can’t run immediately, because it has to wait for a worker node. You can run qstat -f (described below) to verify that the job is in the queue.

qsub Script Template

For submitting NONMEM jobs to the grid, Metrum Research Group recommends using bbr or PsN. However, for many other jobs, a shell script template is most effective.

In this example, we show you how to run an R script on the grid. First, create a script on your workflow disk (let's call it submit.sh) and copy the following into it:

#!/bin/bash
#$ -cwd
#$ -V
#$ -o qsub-job-$JOB_ID.out
#$ -e qsub-job-$JOB_ID.err
#$ -pe orte NUMBER-OF-CORES
Rscript YOUR-SCRIPT.R

Since this is only a template, you will need to update the following:

Change YOUR-SCRIPT.R in the template to the path to the R script you want to run relative to the location of this script.
Change NUMBER-OF-CORES in the template to the number of cores (or "slots" in SGE terminology) that you want to reserve for your job. Note, this only reserves the space on the grid. If you want your job to actually use multiple cores, you must write the code to do that. In the case of an R script, this likely means using the future package, mclapply, or something similar.

Next, run chmod +x submit.sh to make sure your script is executable.

Lastly, navigate to the directory containing your new script and call qsub submit.sh.

Please note that this pattern, and this template script submit.sh is not specific to R. You can use it to launch any script on the grid by replacing the line Rscript YOUR-SCRIPT.R with whatever you need to run your script on the command line.

qstat

The qstat command shows you the status of jobs currently queued or running on the grid (the -f flag shows a more informative output).

$ qstat -f

queuename                     qtype resv/used/tot. load_avg arch   states
--------------------------------------------------------------------------
all.q@ip-10-128-20-32.ec2      BIP       0/1/8      0.04    linux-x64     
    562 0.55500 Run001     user0001     r     11/19/2014 11:59:15     1

This call shows that user0001 has one job (Run001) running with job ID 562 and is currently using one of the eight slots (0/1/8) available on this compute node. The number of available slots depends on the number of vCPUs that you've configured for your worker nodes.

The r shows that job is currently running. A qw in that spot indicates the job is queued and waiting.

Adding the -f gives you more information about the computes nodes available and the jobs running on them. See Monitoring Parallelization for an example of using qstat -f to monitor NONMEM jobs on the grid.

qdel

The qdel command deletes jobs from the grid. This can be called on either queued or currently running jobs.

You can pass your username to delete all of your jobs.

qdel -u <your username>

You can also pass a job ID to delete a specific job. Use qstat to find your job ID:

$ qstat -f
queuename                    qtype resv/used/tot. load_avg arch   states
--------------------------------------------------------------------------
all.q@ip-10-128-20-32.ec2      BIP   0/1/8          0.04    linux-x64     
    562 0.55500 Run001     user0001     r     11/19/2014 11:59:15     1

The job ID for Run001 is 562. Simply run qdel 562 to delete only that job.

You may find it necessary to delete a list of jobs, such as bootstrap jobs that were submitted in error, while other jobs are running. One way is to loop over the job IDs you want to delete in R. In this example, the job IDs go from 600-1600.

In R, you can "shell out" to run commands that you would run on the terminal via the system() command:

for(i in 600:1600){
  system(paste('qdel', i))
}

The block of code above will delete job IDs 600-1600 within RStudio. You could also substitute a vector of run numbers if the job IDs were not consecutive.

qalter

The qalter command alters the running or pending job. For example, you submit 500 bootstrap runs that load up your workflow, and then you need to submit additional runs for a different project or model before your bootstrap runs have completed. Since the SGE system uses a first-in-first-out approach, your additional runs won't start until your bootstrap runs are complete.

In most cases, you would prefer the one-off model runs to take priority over the bootstraps. In this case, you can use qalter to decrease the priority of the pending bootstrap runs so the one-off model fills the next available slot. An example of some qstat -f output is provided below.

$ qstat -f

queuename       qtype resv/used/tot. load_avg arch   states
-------------------------------------------------------------------
all.q@ip-10-128-     BIP   0/8/8      0.06    linux-x64     
    562 0.55500 Run001     user0001     r     11/19/2014 11:59:15     1        
    563 0.55500 Run002     user0001     r     11/19/2014 11:59:20     1        
    564 0.55500 Run003     user0001     r     11/19/2014 11:59:23     1        
    565 0.55500 Run004     user0001     r     11/19/2014 12:00:15     1        
    566 0.55500 Run005     user0001     r     11/19/2014 12:00:18     1        
    567 0.55500 Run006     user0001     r     11/19/2014 12:00:26     1             
    568 0.55500 Run007     user0001     r     11/19/2014 12:01:15     1        
    569 0.55500 Run008     user0001     r     11/19/2014 12:02:15     1        
####################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS 
####################################################   
    571 0.55500  Run010     user0001     p    11/19/2014 12:01:15     1        
    572 0.55500  Run011     user0001     p    11/19/2014 12:02:15     1        
    573 0.55500  Run012     user0001     p    11/19/2014 12:03:15     1     
    574 0.55500  Run013     user0001     p    11/19/2014 12:01:15     1         
    580 0.55500  Run035     user0001     p    11/19/2014 12:03:15     1

Job ID 580 and the remaining four bootstrap jobs are pending (not assigned to a machine). We want to decrease the priority of job IDs 571-574 so that job ID 580 will run before the remaining pending bootstrap jobs. The priority is shown as the decimal number in the qstat output above. Currently, all jobs have the same priority. Decrease the priority of job IDs 571-574 using the qalter command. Again, in this example, we loop in R and call out to qalter via the system() function:

for(i in c(571:574)){
  system(paste('qalter -p -10', i)) 
 }

The value following -p in the above code block sets the priority. You can supply a number between -1023 and 0. By passing -10, you decrease the priority of those jobs. If you look at qstat -f output again, it looks as follows.

$ qstat -f

queuename       qtype resv/used/tot. load_avg arch   states
-----------------------------------------------------------------------
all.q@ip-10-128-     BIP   0/8/8      0.06    linux-x64     
    562 0.55500 Run001     user0001     r     11/19/2014 11:59:15     1        
    563 0.55500 Run002     user0001     r     11/19/2014 11:59:20     1        
    564 0.55500 Run003     user0001     r     11/19/2014 11:59:23     1        
    565 0.55500 Run004     user0001     r     11/19/2014 12:00:15     1        
    566 0.55500 Run005     user0001     r     11/19/2014 12:00:18     1        
    567 0.55500 Run006     user0001     r     11/19/2014 12:00:26     1             
    568 0.55500 Run007     user0001     r     11/19/2014 12:01:15     1        
    569 0.55500 Run008     user0001     r     11/19/2014 12:02:15     1        
####################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS 
####################################################
    580 0.55500  Run035     user0001     qw    11/19/2014 12:03:15     1  
    571 0.55100  Run010     user0001     qw    11/19/2014 12:01:15     1        
    572 0.55100  Run011     user0001     qw    11/19/2014 12:02:15     1        
    573 0.55100  Run012     user0001     qw    11/19/2014 12:03:15     1     
    574 0.55100  Run013     user0001     qw    11/19/2014 12:01:15     1

The above output indicates that job ID 580 is the next job to be executed when a core is available.

Troubleshooting

Sometimes you may not know why a NONMEM job is pending in the queue. One example is with parallel runs when more nodes are requested than are available. The qstat output below demonstrates this situation. (Note: this only happens when using -pe smp instead of -pe orte.)

queuename        qtype resv/used/tot. load_avg arch   states
-----------------------------------------------------------------
all.q@ip-10-128-    BIP   0/0/8      0.06    linux-x64           
####################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS 
################################################### #
    590 0.60383  Run100     user0001     qw   11/19/2014 12:03:15     12  
    591 0.60383  Run101     user0001     Eqw  11/19/2014 12:03:15     1

The 12 at the end of job ID 590 indicates this job needs 12 compute cores. Since there are only eight available, it will pend indefinitely. You should stop this job using qdel and resubmit it with the correct number of cores.

The Eqw state of job ID 591 means SGE failed when it tried to schedule the job. At this point, the job will not run either, and you should also stop it using qdel. In this case, there was likely an error with the job being run. Check your code or control stream for any obvious errors and then resubmit the job.

On rare occasions, the qstat output will indicate that a given node is in alarm state (a), error state (E), unreachable (U), suspended (S), or disabled (d). Some examples of this are shown below.

queuename               qtype resv/used/tot. load_avg arch   states
---------------------------------------------------------------------------
all.q@ip-10-128-20-32.ec2    BIP   0/0/3      0.06    lx24-amd64        E

queuename               qtype resv/used/tot. load_avg arch   states
--------------------------------------------------------------------------
all.q@ip-10-128-22-180.ec2   BIP   0/0/3      0.06    lx24-amd64        U

In the above output, all.q@ip-10-128-20-32.ec2 is in error state and all.q@ip-10-128-22-180.ec2 is unreachable. If after following this troubleshooting guidance, errors continue to persist with nodes that are marked E, U, d, or S you should contact the Metworx help desk. If this occurs, you may need to delete the existing workflow and restart a new one.

Additional Learning

Additional information on all SGE commands is found in the "man" pages for each command. You can access these from the system command prompt by typing man <command>. For example, to learn more about qstat, you could type man qstat:

$ man qstat

QSTAT(1)                                Grid Engine User Commands                                QSTAT(1)

NAME
       qstat - show the status of Grid Engine jobs and queues

SYNTAX
       qstat   [-ext]   [-f]   [-F  [resource_name,...]]   [-g  c|d|t[+]]  [-help]  [-j  [job_list]]  [-l
       resource=val,...]   [-ne]   [-pe   pe_name,...]    [-ncb]   [-pri]   [-q    wc_queue_list]    [-qs
       a|c|d|o|s|u|A|C|D|E|S]  [-r]  [-s  {r|p|s|z|hu|ho|hs|hd|hj|ha|h|a}[+]]  [-t]  [-U  user,...]   [-u
       user,...]  [-urg] [-xml]

DESCRIPTION
       qstat shows the current status of the available Grid Engine queues and the  jobs  associated  with
       the  queues.  Selection options allow you to get information about specific jobs, queues or users.
       If multiple selections are done, a queue is only displayed if all selection criteria for  a  queue
       instance are met.  Without any option qstat will display only a list of jobs, with no queue status
       information.
   ...