NONMEM Warning - Job Not Allowed to Run


NONMEM Warning - Job Not Allowed to Run

Description of the Problem

This guidance pertains to a common warning when performing NONMEM operations on a workflow. When attempting to run a NONMEM model/job, you may notice one of the following "unexpected" messages:

  • Click here to see an example of a console warning
    Unable to run job: warning: <your-user-name's> job is not allowed to run in any queue
    
    Your job <number> ("<model-name>") has been submitted
    
    Exiting.
  • Click here to see an example errors you may find in a .cat output
    cannot locate <file-name> in control stream for run
    
    cannot locate <another-file-name> in control stream for run
    
    cannot locate <other-related-file-name> in control stream for run
    
    Run <number> has exit code 1
  • Running qstat() may also indicate there are no worker nodes.

Solution

The good news is that the messages themselves indicate a problem has been recognized and is being addressed. Unless told not to, workflows generally shut down unused/latent compute nodes based on inactivity. This is generally a good thing as it helps you avoid incurring unnecessary usage costs for resources when they are not needed. In this case, the workflow has identified that it needs additional compute power to perform a task, and is provisioning additional compute power to facilitate the job you submitted.

In this case, what happens is that the workflow identifies:

  1. That it has no compute/worker nodes available; and
  2. That it needs additional compute power based on the model/job you are submitting.
  3. In the case of the console warning, the warning is basically saying it could not queue your job for the reason noted above, but that it is provisioning a new compute/worker node in order to address this.
  4. There may be additional/related error messages in your .cat output, which are basically expressing that it could not find other pieces related to this job (because this job is delayed pending provisioning additional compute node/s).
  5. This also explains why running qstat() indicates no worker nodes - there were none when you submitted the job, so the workflow is provisioning additional compute power to support the job.

No action is required, and this is actually indicative that things are working as designed. However, if you find this to be a common occurrence or a nuisance, you can configure your workflow to accomodate your needs accordingly.

Note: You can make this configuration when launching a workflow, or by selecting Update from the workflow dashboard to update an extant workflow (the fields you need to update are available in both the "New Workflow" configuration screen and the "Update Workflow" configuration screen).

  1. Regardless of whether you are updating or launching a new workflow, find the following fields in the workflow configuration screen, and make selections that fit your needs:

    • a. Initial size - this allows you to specify the initial workflow cluster size (increasing this means when you launch a workflow, it will start off with the number of nodes you specify here).
    • b. Maintain initial size - toggling this checkbox will specify whether or not the workflow can shut down inactive/latent compute/worker nodes (which it will do if unchecked). Checking this box will cause the workflow to maintain the number of nodes you specified in a above. Figure 1