Running Monolix in Parallel


Introduction

Monolix allows the user to utilize parallel processing to improve the speed of modeling runs. This is enabled by default on Metworx. This article describes several ways to take advantage of this parallel processing on Metworx.

Using the GUI on the master node

If you are using the Monolix GUI in the Metworx virtual desktop, your models will be running on your master node. By default, it will use all available cores on your master node.

You can verify this, as well as monitor your modeling process, by opening a terminal and using the htop Linux utility. Simply click the "Konsole" icon on the bottom left to open a new terminal window. Then type htop and press enter.

Notice that, when your model is running in Monolix, you will see all of the cores lit up green to indicate that they are doing work, as well as a number of Monolix subprocesses in the process list.

monolix gui

Running models on the SGE grid

You can also use the terminal (either in the virtual desktop, in Rstudio, or via SSH) to submit Monolix models to run on the compute nodes of the SGE grid. This is primarily beneficial because it allows you to use Metworx's auto-scaling capabilities. That is: you can submit some number of models and Metworx will automatically scale up enough compute nodes to run those models, and then turn them off when the models are finished.

Create submission script

First, you will create a shell script for submitting to run on the SGE grid. Open a blank text file, copy the following into it, and save it as monolix_sge.sh. Likely, you will want to save this to your Metworx disk, in the same directory as your Monolix project file.

#!/bin/bash
#$ -cwd
#$ -V
#$ -o submit-monolix-$JOB_ID.out
#$ -e submit-monolix-$JOB_ID.err
#$ -pe orte 4
/opt/monolix/MonolixSuite2020R1/lib/monolix --no-gui -p $1

There is more detail on what this script is doing in the Appendix at the bottom of this document.

Setting the number of cores

The -pe orte 4 line in the above script specifies that this model should be parallelized across 4 vCPU's. There are two important points here:

  • You must change this number to use more or less cores for this model.
  • Do not set this number to anything larger than the size of your individual compute nodes. If you would like to run a single model across multiple compute nodes, refer to the next section on MPI.

Call the submission script

The shell script above takes one argument: the path to the Monolix project file to run. This should either be an absolute path (safest) or relative to the location of the monolix_sge.sh shell script that you will submit.

To run this on the SGE grid, simply pass it to qsub. An example call to run my_model.mlxtran:

$ qsub monolix_sge.sh /path/to/my_model.mlxtran

Monitor the model submission

The simplest way to monitor jobs on the SGE grid is with the qstat -f command. The screenshot below shows submitting the model and then verifying that it is running on the grid with qstat -f.

monolix sge

Using MPI on the SGE grid

Metworx also has MPI installed. This allows SGE to spread a single job across multiple compute nodes. To do this, you will follow the same steps as the above SGE section, except you will add mpirun to the beginning of the Monolix call in the script.

Create submission script

Create a shell script: open a blank text file, copy the following into it, and save it as monolix_mpi.sh. Likely, you will want to save this to your Metworx disk, in the same directory as your Monolix project file.

#!/bin/bash
#$ -cwd
#$ -V
#$ -o submit-monolix-$JOB_ID.out
#$ -e submit-monolix-$JOB_ID.err
#$ -pe orte 8
mpirun /opt/monolix/MonolixSuite2020R1/lib/monolix --no-gui -p $1

Setting the number of cores

Same as above, the -pe orte 8 line specifies that this model should be parallelized across 8 vCPU's. There are two important points here:

  • You must change this number to use more or less cores for this model.
  • When using MPI, this number can essentially be as large as you would like. If you ask for more vCPU's than are available on a given compute node, SGE will launch more compute nodes and MPI will distribute the model across those nodes.

Call and monitor the submission script

As before, you will submit the model with qsub and pass the path to your project file. An example call to run my_model.mlxtran:

$ qsub monolix_mpi.sh /path/to/my_model.mlxtran

The screenshot below shows submitting the model and then verifying that it is running on the grid with qstat -f.

monolix mpi 1

Verify that MPI is working

Note: this step is not necessary for running your model, it only demonstrates how you can emperically verify that your MPI command is executing as intended.

You can further verify that the model is actually using cores on multiple nodes by SSH'ing to the compute nodes and running htop. In the screenshots below, you will see the submission call from the previous screenshot in the bottom terminal, while the top terminal shows htop on one of the compute nodes. Notice the IP address in the top terminal matches one of the IP addresses in the qstat -f call below.

Checking the first worker node

monolix mpi 2

Checking the second worker node

monolix mpi 3

SSH'ing to compute nodes

This can be done from the terminal on your master node at any time. You can see the IP addresses of any worker nodes with qstat -f. Simply copy that into your ssh call. In the example above, it would be ssh ip-10-128-31-203. Then you can type exit at any time to close this connection and return to the master node terminal.

Appendix: More detail on the submission script

This section provides extra detail on the SGE submission shell script, and can generally be skipped by most users. The above scripts do the following:

  • Calls the full path to the Monolix command line executable

    • In this example we use /opt/monolix/MonolixSuite2020R1/lib/monolix though this can be changed to a different installation if desired.
    • Optionally, appends mpirun at the front, to invoke MPI.
    • Monolix needs to be installed at this path on all the worker nodes as well as the master node. (On Metworx it is, so this works.)
    • Also sets the --no-gui flag, because we are calling from the command line and do not which to launch the GUI.
  • Passes the path to the Monolix project file through to the -p argument.
  • Sets some SGE options (everything preceded by #$)

    • -pe orte 8 to reserve the right number of slots on the grid (in this case 8). Change this value to use more or less threads for your Monolix job.
    • -cwd changes working directory to the directory of the shell script.
    • -V passes through environment variables to the worker nodes.
    • -o and -e specify log files for stdout and stderr respectively.