NONMEM on Metworx

Summary

Metworx is designed as an optimal platform to run computationally intensive jobs such as executing large NONMEM models. There is a variety of NONMEM-related information in this Knowledge Base. This article addresses one of the most common questions we get from Metworx users who model with NONMEM: "How do I use Metworx to make my NONMEM models run faster?"

The short answer to this question is to parallelize your models and submit them to run on the compute nodes of the Metworx grid. We explore some approaches to do this below and highlight some of the nuance involved in optimizing the power of Metworx.

At a high level, the most important things to remember for optimal use of NONMEM on Metworx are:

Select a small (typically 2 vCPU) head node.
Select compute nodes that have the same number of vCPUs as you intend to parallelize your NONMEM model across. That is, if you intend to pass threads=16 to bbr or put NODES=16 in your .pnm file, then select 16 vCPU for compute nodes.
Make sure you submit your NONMEM job to run on SGE (or Slurm).
Make sure you have parallelized your NONMEM job, either via threads=<N> for bbr or a .pnm file for PsN.

Parallelizing NONMEM on Metworx

"Parallelizing" a job means submitting the job in a way that the work is split over multiple CPU cores. The Parallel Computing Intro article provides more background information for beginners before proceeding with this article.

The ideal number of cores for a NONMEM run depends on the data and the model. Importantly, it is not true that using more cores is always better. Generally speaking, for a "large" NONMEM model (one that takes an hour or more to run without parallelization) starting with somewhere between 16 and 32 cores is likely sufficient. For subsequent runs of the same (or similar) models, you may vary it by 4 cores or so and evaluate the speed.

Additionally, there is an important distinction between how many cores you ask NONMEM to use (specified in the .pnm file) and how many cores you ask Metworx to make available to you (configured when you launch your workflow). More detail follows.

Configuring your workflow

Before reading further, it is highly recommended that you read the Grid Computing Intro article. It will give you a conceptual overview of some of the topics discussed below.

For large NONMEM jobs, we recommend executing them on the compute nodes. When working in this way, your ideal Metworx configuration will be:

2 vCPU head node
16 vCPU (or more) compute nodes. The size of your compute nodes should match the number of cores that you will tell NONMEM to use for a given model (more on that below).

One caveat is when running a PsN bootstrap, the head node should be 8 vCPU because the PsN bootstrap routine uses the head node to do the resampling before it submits the bootstrap models to execute.

Executing NONMEM on Compute Nodes

How you submit your model influences what you tell NONMEM to execute on the compute nodes of the grid. Some common options are listed:

Submitting through R with bbr
- Use the .mode argument to submit_model(). This defaults to submitting to the SGE grid, which is what you want.
Using PsN execute command
- Use the -run_on_sge and -sge_prepend_flags options. Described in NONMEM via PsN.
Submitting through Pirana GUI
- Select "Job-scheduler: SGE". Described in NONMEM via Pirana.

Submitting NONMEM jobs with Slurm

This page primarily discusses using SGE as the job scheduler for submitting jobs to the grid. Blueprints 24-04 and later have the option to use Slurm (instead of SGE) as the job scheduler. bbr supports submitting NONMEM jobs with Slurm. This support was added in bbr 1.13.0 and bbi v3.4.0 (both released in January 2025).

Users can specify Slurm directly, via the .mode argument, like submit_model(..., .mode = "slurm"). Note that the .mode argument defaults to checking options("bbr.bbi_exe_mode"), so Slurm users may find it easiest to add the following line to their project's .Rprofile file, to avoid needing to specify .mode in each submit_model() call.

options("bbr.bbi_exe_mode" = "slurm")

See "SGE-to-Slurm Quick Start Guide" for information on using Slurm.

Executing NONMEM in parallel

Once you told NONMEM to execute on multiple compute nodes, you can tell it to parallelize your model across multiple cores. Similarly to above, how you do this depends on your method of NONMEM execution.

Submitting through R with bbr
- Pass list(threads=<N>, parallel = TRUE) to the .bbi_args argument of submit_model(). (where <N> is the number of cores you want to use). This is described in more detail in this "Running NONMEM in Parallel" bbr vignette.
Using PsN execute command
- Use the -parafile option and a .pnm file that specifies the number of cores to use. Described in NONMEM via PsN.
Submitting through Pirana GUI
- Select "Auto-MPI" and " nodes" (where is the number of cores you want to use). Described in NONMEM via Pirana.

An important note on "node" terminology: NONMEM uses the word "nodes" to mean the number of CPU cores that you want the model to parallelize across. This can be confusing for users because Metworx (and most of the High Performance Computing community) use the word "nodes" to mean actual servers in your HPC cluster. This is discussed in Grid Computing Intro.

Choosing the best number of cores

Understanding the ideal number of cores to parallelize a NONMEM model across can be complex. Often trial-and-error based on previous experience is the best way to go, starting with something between 16 and 32 cores, for a fairly complex model.

There are also ways to test this empirically if your model is large enough that testing and optimizing seems worthwhile. The bbr::test_threads() function is designed to make this easier. An example of this is in the Running NONMEM in Parallel bbr vignette. If you are using PsN, you can do this same testing manually, as described in NONMEM via PsN.