Setup pkgr, MPN and renv


Scope

In this section, we demonstrate how to set up your project to take advantage of MetrumRG's package management suite. If you're interested in using this across your organization, we strongly recommend creating a "project template" so that users don't have to do this setup each time. Please contact Metworx support if you're interested in getting help with this.

Quick Setup

If you already have your R project set up and saved on a Metworx disk, and you have a pkgr.yml file, follow these simple steps to install your desired packages in an isolated library within your project.

  1. Open your project in Rstudio.
  2. In the R console, run renv::init(bare = TRUE). This will activate renv and then it will restart your R session. Once R has restarted, run .libPaths() in your R console. You should see </full/path/to/your/project>/renv/library/<R-version>/<system-specs>/. This verifies that you are using the isolated library that we have just set up.
  3. Put your pkgr.yml file in the top-level directory for your project.
  4. In the terminal (not the R console) run pkgr install. You will see your packages begin installing. We recommend using a Workflow with at least 8 vCPU's to take advantage of pkgr's parallelism. If you have fewer vCPU's and you are installing a non-trivial number of packages, this could take a while.
  5. Once pkgr has finished, restart your R session to make sure it finds the newly installed packages.

If you have trouble with any of these steps, refer to the relevant section under "Full Setup Details" below.

Full Setup Details

This section describes each step of setting up your project, in detail. The high-level steps are:

Setting up Your R Project

Throughout this tutorial, we will refer to your "top-level project directory". We recommend putting both a <project-name>.Rproj file and an .Rprofile file in that directory. This will make it easier to open your project in RStudio and be sure all the right configuration is loaded with it.

You should also be sure to include an .Rprofile in any sub-directory from which you will run scripts, or that contains .Rmd files that will be knit. The .Rprofile files in these sub-directories should contain only this line, pointing back to the .Rprofile at the top-level project directory:

source("../.Rprofile", chdir = TRUE)

This ensures your root .Rprofile will always be sourced and your isolated package environment activated.

Adding a pkgr.yml File

The pkgr.yml file tells pkgr what to install and where to install it, as well as controlling various configuration options. A complete pkgr.yml might look like this:

Version: 1

Packages:
  - tidyverse
  - data.table
  - bbr
  - yspec
  - pmtables
  - pmplots
  - here
  - sessioninfo
  #- likely many other packages here...

Repos:
  - MPN: https://mpn.metworx.com/snapshots/stable/2022-02-11
      
Lockfile:
    Type: renv

Rpath: ${R_EXE_4_1}

Each part of this is explained below. You can also find more details on usage and more advanced configuration on the "pkgr Details" KB Page or in the pkgr User Manual.

NOTE If you already have a pkgr.yml file you can skip this section, though you may want to skim it to decide if there are parts of your pkgr.yml that you want to update (likely Repos, and possibly also Packages and Rpath).

Version

This is the version of the pkgr.yml configuration file, and is required in every pkgr.yml. At this point, it should always say Version: 1.

Packages

This is the section where you list all of the packages that you want to install.

This "declarative" design is one of pkgr's most notable features. Instead of iteratively installing packages as you go, pkgr lets you "declare" all of the packages that you want installed on the project, and where you want to get them, all in one place.

Packages:
  - tidyverse
  - data.table
  - bbr
  - yspec
  - pmtables
  - pmplots
  - here
  - sessioninfo
  #- likely many other packages here...

Repos

This section defines where to download the specified packages from. This is where MPN comes in.

MPN is a "snapshot" repo. This means that it contains static snapshots of packages on a given date. The date the snapshot was taken is easily found at the end of the URL. They are taken roughly monthly. By pointing to one of these snapshots, you ensure that pkgr will always download the same version of every package.

Repos:
  - MPN: https://mpn.metworx.com/snapshots/stable/2022-02-11

In contrast, pointing to something like CRAN, that updates as newer versions of packages are released, will not freeze your package versions. If reproduciblity of your analysis is important to you, we recommend only pointing to MPN.

Multiple repos

You can point pkgr to multiple repos, though this should be done with care, keeping in mind the reproducibility concerns mentioned in the previous section.

Repos:
  - MPN: https://mpn.metworx.com/snapshots/stable/2022-02-11
  - CRAN: https://cran.rstudio.com

It's important to note that pkgr will check each repo, in the order listed, for each package. The example above will first look for all packages on the specified MPN repo, and then fall back to looking on CRAN if it doesn't find any of them in the MPN snapshot.

To support reproducibility we recommend using a "snapshot" repo whenever possible, even when not using MPN. For example, Posit provides CRAN snapshots. You can find the URL for a specific date here

Repo customizations

It is possible to tell pkgr to look for specific packages only in a specific repo, using the Customizations section.

Repos:
  - MPN1: https://mpn.metworx.com/snapshots/stable/2021-11-19 # original snapshot
  - MPN2: https://mpn.metworx.com/snapshots/stable/2022-02-11 # newer snapshot
  - CRAN: https://cran.rstudio.com
  
Customizations:
  Packages:
    - dplyr:
        Repo: MPN2

In this example, pkgr would do the following:

  • first look for all packages (excluding dplyr) on the 2021-11-19 MPN snapshot
  • then look on the 2022-02-11 snapshot for dplyr (and any others it didn't find on the 2021-11-19 snapshot)
  • then look on CRAN for any others that it didn't find on either MPN snapshot (again, excluding dplyr, which we have specified should only come from the 2022-02-11 snapshot)

Lockfile

You are required to specify either Lockfile or Library to tell pkgr where to install packages. Since we are using renv to isolate our package environment, you only have to include the following, and renv will tell pkgr where to install packages.

Lockfile:
  Type: renv

This will create a sub-directory within your project for the installed packages. This is essential because it ensures that each project has control over which packages, and which versions of those packages, it is using.

Note that the library path used will change (and force you to re-install packages) if you switch to a different version of R or a different operating system. This is a good thing, and will avoid other weirder errors that could occur if you switched and did not re-install your packages. Likely, the way you will notice this is that packages you were previously using will suddenly say Error in library(<package_name>) : there is no package called ‘<package_name>’ when you try to load them.

If you see this, the first thing to check is whether you are using a different version of R. Restart your R session and the top of the start message in the console will tell you what R version you are currently using. Check that this is the same as the Rpath in your pkgr.yml (explained in the next section).

Rpath

This controls which version of R (technically which R executable on your system) pkgr will use to install your packages. If you don't set this, pkgr will use the default version of R on the system. However, we recommend always setting this because it greatly improves reproducibility. It should be set to the same version of R that you are planning to use for your analysis.

Rpath: ${R_EXE_4_1}

pkgr expects to see the path to the R executable, however on Metworx there are environmental variables that point to each R installation. You can use ${R_EXE_<major>_<minor>} as shown above, where <major> is the major version release number and <minor> is the minor version release number, and pkgr will find the desired version of R on the system. For example, Rpath: ${R_EXE_4_1} shown above, would use the R 4.1.x.

Updating R Versions

If later you want to run the same analysis with a different version of R, you can change this one field and re-run pkgr install and you will install the same package versions, but with the new R version. Again, consider that this could have reproducibility implications for your analysis.

Setting up renv

To begin, you will need a system-wide installation of renv. If you are on Metworx, this will already be installed. You can double-check by opening an R console and typing packageVersion("renv").

Start an R session in your project folder (click your .Rproj file if you have one), and then run renv::init(bare = TRUE) in the R console to initiate renv in the project. This will do several things, including modifying your .Rprofile and creating an renv/ directory.

Installing With pkgr

In the top-level directory of your project, run pkgr plan in your terminal (not your R console) to preview the packages that will be installed. You should also see the installation directory as renv/library/<R-version>/<system-specs>/ (you can run pkgr plan | grep 'Library path' to filter to only that relevant line).

If everything looks right, run pkgr install in the same terminal. Your packages will begin installing. We recommend using a workflow with at least 8 vCPU's to take advantage of pkgr's parallelism. If you have fewer vCPU's and you are installing a non-trivial number of packages, this could take a while. Feel free to use more cores if you know you are installing a particularly large number of packages (several hundred).

Once pkgr has finished, restart your R session. In the R console, enter .libPaths(). You should see </full/path/to/your/project>/renv/library/<R-version>/<system-specs>/ (the same as you saw above, but this time an absolute path). This verifies that you are using the isolated package installations that we have just set up.

Updating Your Package Installations

If you are using a repo that updates with new releases (like CRAN), or if you have added a newer MPN snapshot to your pkgr.yml file since you last installed, you will need to tell pkgr to update the installed packages.

In your terminal, run pkgr plan --update to preview the updates that will be made, and then pkgr install --update if they look right. This will check all of the specified repos and re-install any packages that have newer versions than what you have installed.

After doing an update like this, it is generally a good idea to re-run any portions of your analysis that are feasible to re-run, to make sure that you still get the expected results with any updated packages.

On all Metworx versions 21.08 and earlier, pkgr 2.x.x, which has the behavior described above, is installed by default. However, note that with pkgr 3.x.x updating became the default behavior. This means that pkgr plan and pkgr install will look for updated versions of your packages, unless you pass --no-update. Remember though, that if you are only pointing to static repos like MPN, the package versions in those repos will never change and will therefore not trigger pkgr to update anything.

Notes on Version Control

Generally speaking, we do not recommend checking your installed packages into version control. If you are using git, renv takes care of this by dropping renv/ into a top-level .gitignore file. If you are using SVN, while there are ways to ignore a directory, it is easiest to just conscientiously not check it in.

While it might be tempting to want to keep the package installations "with" the code, for the sake of reproducibility and portability, it has several downsides. Notably:

  • It can backfire if someone attempts to re-run your code on a different OS or a different version of R withough re-installing all of the packages.
  • These kinds of directories and files are, both conceptually and practically, not the kind of thing you want to version control.

However, the important point here is that this way of using pkgr together with renv and MPN is specifically designed to make it easy for someone else to install the exact same versions of the packages on their own system. To do so, they would simply:

  • Make sure they have pkgr installed (if not, installation instructions are here)
  • Navigate to the project directory and run pkgr install in the terminal

NOTE If you are using Metworx, you can skip the first step, because pkgr will already be installed.