pkgr For R Package Development


pkgr, used in conjunction with renv and optionally MPN, can be very useful for R package development. Most Metrum-developed packages use this paradigm. You can see an example of this in the bbr repo (development environment setup instructions here, pkgr.yml file here.

Motivation

When developing an R package, it is important to pay attention to the versions of your dependencies (i.e. the packages you import, typically by listing them in your DESCRIPTION file under Imports, Depends, or Suggests). Using pkgr and renv allows you to control which versions of your dependencies you are pulling in, and keeps them isolated from any other installations of those same packages elsewhere on your system.

Setup

This section takes you step-by-step through setting up pkgr and renv in your package. If you are a new developer working on a package that has pkgr and renv already set up in the repo, you can skip steps 5 and 9 because there should already be a pkgr.yml file in the repo and your *ignore files should already be set up correctly.

  1. Clone the repository of the package.
  2. Make sure pkgr is installed. If you are using Metworx, it will already be installed.
  3. Make sure you have a global installation of renv. This can be done by opening an R console and typing packageVersion("renv"). You must have at least version 0.8.3-4. If you are on Metworx, this will already be installed. If you do not have it, you can install it from CRAN with install.packages("renv"), from an R session that is outside of an activated renv project.
  4. Start an R session in your project folder (click your .Rproj file if you have one) and then run renv::init(bare = TRUE) in the R console to initiate renv in the project. This will do several things, including modifying your .Rprofile and creating an renv/ directory.
  5. Add a pkgr.yml file to the top-level directory of your project. See "Setting up your pkgr.yml file" below for details. Add ^pkgr.yml$ to your .Rbuildignore file. Note: do not add pkgr.yml to your .gitignore. You do want this file checked into version control.
  6. In your terminal, in the top-level directory of your project, run pkgr plan to preview the packages that will be installed. You should also see an installation directory containing renv/library/ (you can run pkgr plan | grep 'Library path' to filter to only that relevant line).
  7. If everything looks right, run pkgr install in the same terminal. Your packages will begin installing. If you are on Metworx, we recommend using a Workflow with at least 8 vCPU's to take advantage of pkgr's parallelism. If you have less vCPU's, and you are installing a non-trivial number of packages, this could take awhile.
  8. Once pkgr has finished, restart your R session. In the R console, enter .libPaths(). You should see the same path you saw above, but this time as an absolute path. This verifies that you are using the isolated package installations that we have just set up.
  9. As a final step, we recommend adding .Rprofile, renv/, and renv.lock to your top-level .gitignore file. This will make it easier for other developers to use this same process on this repo in the future. renv/ and renv.lock should also be in your .Rbuildignore, but renv::init() should have already added them. If you'd like, you can check to be sure.

Setting up your pkgr.yml file

A complete example pkgr.yml for package development might look like this:

Version: 1

Descriptions:
- DESCRIPTION

Packages:
- devtools
- usethis
- pkgdown

Repos:
  - CRAN: https://cran.rstudio.com
  - MPN: https://mpn.metworx.com/snapshots/stable/2022-02-11 # used for mrgval

Lockfile:
  Type: renv

Each part of this is explained below.

Lockfile

We will start at the bottom, because this is the essential section for using pkgr and renv together. You are required to specify either Lockfile or Library to tell pkgr where to install packages. Since we are using renv to isolate our package environment, you only have to include the following, and renv will tell pkgr where to install packages.

Lockfile:
  Type: renv

Note that the library path used will change (and force you to re-install packages) if you switch to a different version of R or a different operating system. This is a good thing, and will avoid other weird errors that could occur if you switched and did not re-install your packages.

Version

This is the version of the pkgr.yml configuration file, and is required in every pkgr.yml. At this point, it should always say Version: 1.

Descriptions

This is the key element for package development. Setting the following will tell pkgr to pull out all dependencies listed in the DESCRIPTION file for your package.

Descriptions:
- DESCRIPTION

This means that you don't have to explicitly list all of the packages you want in the Packages: section. Simply update your DESCRIPTION file as you normally would, and pkgr will pick up any changes.

Packages

If you have listed your DESCRIPTION file (described in the previous section) then this section only needs to contain packages that you would like to use for development purposes that are not in your DESCRIPTION file (i.e. are not formal dependencies of your package).

Packages:
- devtools
- usethis
- pkgdown

devtools is a prime example of this kind of package, but there are others you may consider (for example, pkgdown for building a documentation site, or usethis for setting up package scaffolding).

Repos

What you put in this section is up to developer discretion. There are two options:

  • Pull packages from CRAN (or another repository that is always updated with the most recent versions of packages)
  • Pull packages from MPN (or another repository that is a stable unchanging snapshot of packages)

Pros and cons

The advantage of pulling from CRAN is that you can run pkgr install --update frequently and you will always have the most recent versions of your dependencies. This will allow you to catch any breaking changes as soon as possible. The downside is that it forces the developer to deal with those breaking changes as soon as they appear. Metrum typically uses this strategy.

The advantage of using something like MPN is that you will always know what versions of your dependencies you are developing with. This provides more stability while developing, and allows you to conscientiously update to a newer snapshot when you are ready to test the package with updated dependencies. The downside is that there may be breaking changes in your dependencies that you will not catch until you have updated your snapshot, and users may experience the results of these before you notice them (if the users are pulling other packages from CRAN).

Other considerations

You will see the example above has both CRAN and MPN, with CRAN listed first:

Repos:
  - CRAN: https://cran.rstudio.com
  - MPN: https://mpn.metworx.com/snapshots/stable/2022-02-11 # used for mrgval

This will make pkgr first look on CRAN for all packages, and then look on MPN for any packages it didn't find on CRAN. This is useful if you have dependencies that are not on CRAN.

Another option is to use the Customizations: section to pin the version of specific packages by telling pkgr to download them from a stable snapshot repo, even if the rest of your dependencies are getting pulled from CRAN.

Customizations:
  Packages:
    - dplyr:
        Repo: MPN

There may be situations where you want to do this because of a known breaking change that you have not yet dealt with. However, you should take this path with care. In general, having a strict (forward-looking) version constraint on your dependencies is not an ideal situation long term.