SGE Queue Management


Scenario

It is rare that a user would need to troubleshoot their SGE queue. However, there are instances where it is necessary. For example, a user may need to clear specific jobs when there are two aup nodes "broken".

Identifying nodes in "auo" state

$ qstat -f | grep auo

For more information on Qstat usage, refer to the following documentation: Qstat Documentation

You can list the jobs running on the resulting nodes using the following command:

$ qstat -qs auo -s r

Note: These jobs may still be writing as expected. Check to see if the modeling subdirectories are still being written to before clearing jobs. In order to confirm, check that any files are not changing.

Image Not Found

Clearing Jobs

Note: The following command has the ability to clear running jobs. Ensure that you want to clear them first.

$ qstat -qs auo -s r | awk '{print $1}' | xargs qdel

You can also clear jobs from a specific IP address using the following command:

$ `qstat -qs auo -s r | grep <IP> | awk '{print $1}' | xargs qdel`

For example, to clear jobs on a node with IP address 36.47.52.248, the command would read as:

$ `qstat -qs auo -s r | grep 36.47.52.248 | awk '{print $1}' | xargs qdel`