NONMEM on Metworx
Summary
Metworx is designed as an optimal platform to run computationally intensive jobs such as executing large NONMEM models. There is a variety of NONMEM-related information in this Knowledge Base. This article addresses one of the most common questions we get from Metworx users who model with NONMEM: "How do I use Metworx to make my NONMEM models run faster?"
The short answer to this question is to parallelize your models and submit them to run on the compute nodes of the Metworx grid. We explore some approaches to do this below and highlight some of the nuance involved in optimizing the power of Metworx.
At a high level, the most important things to remember for optimal use of NONMEM on Metworx are:
- Select a small (typically 2 vCPU) head node.
- Select compute nodes that have the same number of vCPUs as you intend to parallelize your NONMEM model across. That is, if you intend to pass
threads=16
tobbr
or putNODES=16
in your.pnm
file, then select 16 vCPU for compute nodes. - Make sure you submit your NONMEM job to run on SGE (or Slurm).
- Make sure you have parallelized your NONMEM job, either via
threads=<N>
for bbr or a.pnm
file for PsN.
Parallelizing NONMEM on Metworx
"Parallelizing" a job means submitting the job in a way that the work is split over multiple CPU cores. The Parallel Computing Intro article provides more background information for beginners before proceeding with this article.
The ideal number of cores for a NONMEM run depends on the data and the model. Importantly, it is not true that using more cores is always better. Generally speaking, for a "large" NONMEM model (one that takes an hour or more to run without parallelization) starting with somewhere between 16 and 32 cores is likely sufficient. For subsequent runs of the same (or similar) models, you may vary it by 4 cores or so and evaluate the speed.
Additionally, there is an important distinction between how many cores you ask NONMEM to use (specified in the .pnm
file) and how many cores you ask Metworx to make available to you (configured when you launch your workflow). More detail follows.
Configuring your workflow
Before reading further, it is highly recommended that you read the Grid Computing Intro article. It will give you a conceptual overview of some of the topics discussed below.
For large NONMEM jobs, we recommend executing them on the compute nodes. When working in this way, your ideal Metworx configuration will be:
- 2 vCPU head node
- 16 vCPU (or more) compute nodes. The size of your compute nodes should match the number of cores that you will tell NONMEM to use for a given model (more on that below).
One caveat is when running a PsN bootstrap, the head node should be 8 vCPU because the PsN bootstrap routine uses the head node to do the resampling before it submits the bootstrap models to execute.
Executing NONMEM on Compute Nodes
How you submit your model influences what you tell NONMEM to execute on the compute nodes of the grid. Some common options are listed:
-
Submitting through R with bbr
- Use the
.mode
argument tosubmit_model()
. This defaults to submitting to the SGE grid, which is what you want.
- Use the
-
Using PsN
execute
command- Use the
-run_on_sge
and-sge_prepend_flags
options. Described in NONMEM via PsN.
- Use the
-
Submitting through Pirana GUI
- Select "Job-scheduler: SGE". Described in NONMEM via Pirana.
Submitting NONMEM jobs with Slurm
This page primarily discusses using SGE as the job scheduler for submitting jobs to the grid. Blueprints 24-04 and later have the option to use Slurm (instead of SGE) as the job scheduler. bbr
supports submitting NONMEM jobs with Slurm. This support was added in bbr 1.13.0
and bbi v3.4.0
(both released in January 2025).
Users can specify Slurm directly, via the .mode
argument, like submit_model(..., .mode = "slurm")
. Note that the .mode
argument defaults to checking options("bbr.bbi_exe_mode")
, so Slurm users may find it easiest to add the following line to their project's .Rprofile
file, to avoid needing to specify .mode
in each submit_model()
call.
options("bbr.bbi_exe_mode" = "slurm")
See "SGE-to-Slurm Quick Start Guide" for information on using Slurm.
Executing NONMEM in parallel
Once you told NONMEM to execute on multiple compute nodes, you can tell it to parallelize your model across multiple cores. Similarly to above, how you do this depends on your method of NONMEM execution.
-
Submitting through R with bbr
- Pass
list(threads=<N>, parallel = TRUE)
to the.bbi_args
argument ofsubmit_model()
. (where<N>
is the number of cores you want to use). This is described in more detail in this "Running NONMEM in Parallel" bbr vignette.
- Pass
-
Using PsN execute command
- Use the
-parafile
option and a.pnm
file that specifies the number of cores to use. Described in NONMEM via PsN.
- Use the
-
Submitting through Pirana GUI
- Select "Auto-MPI" and "
nodes" (where is the number of cores you want to use). Described in NONMEM via Pirana.
- Select "Auto-MPI" and "
An important note on "node" terminology: NONMEM uses the word "nodes" to mean the number of CPU cores that you want the model to parallelize across. This can be confusing for users because Metworx (and most of the High Performance Computing community) use the word "nodes" to mean actual servers in your HPC cluster. This is discussed in Grid Computing Intro.
Choosing the best number of cores
Understanding the ideal number of cores to parallelize a NONMEM model across can be complex. Often trial-and-error based on previous experience is the best way to go, starting with something between 16 and 32 cores, for a fairly complex model.
There are also ways to test this empirically if your model is large enough that testing and optimizing seems worthwhile. The bbr::test_threads()
function is designed to make this easier. An example of this is in the Running NONMEM in Parallel bbr vignette. If you are using PsN, you can do this same testing manually, as described in NONMEM via PsN.