pkgr is a rethinking of the way packages are managed in R. Namely, it embraces
the declarative philosophy of defining ideal state of the entire system, and working
towards achieving that objective. Furthermore,
pkgr is built with a focus on reproducibility
and auditability of what is going on, a vital component for the pharmaceutical sciences + enterprises.
For usage documentation, see the pkgr User Manual. For details about the motivation for
pkgr, its design, and specific usage examples, read on.
install.packages and friends such as
remotes::install_github have a subtle weakness --
they are not good at controlling desired global state. There are some knobs that
can be turned, but overall their APIs are generally not what the user actually needs. Rather, they
are the mechanism by which the user can strive towards their needs, in a forceably iterative fashion.
With pkgr, you can, in a parallel-processed manner, do things like:
- Install a number of packages from various repositories, when specific packages must be pulled from specific repositories
Suggestedpackages only for a subset of all packages you'd like to install
Customize the installation behavior of a single package in a documentable and reproducible way
- Set custom Makevars for a package that persist across system installations
- Install source versions of some packages but binaries for others
- Understand how your R environment will be changed before performing an installation or action.
Today, packages are highly interwoven. Best practices have pushed towards small, well-scoped packages that do behaviors well. For example, rather than just having plyr, we now use dplyr+purrr to achieve the same set of responsibilities (dealing with dataframes + dealing with other list/vector objects in an iterative way). As such, it is becoming increasingly difficult to manage the set of packages in a transparent and robust way.
pkgr in action
pkgr is a command line utility with several top level commands. The two primary commands are:
pkgr plan # show what would happen if install is run pkgr install # install the packages specified in pkgr.config
The actions are controlled by a configuration file that specifies the desired global state, namely, by defining the top level packages a user cares about, as well as specific configuration customizations.
For example, a pkgr configuration file might look like:
Version: 1 # top level packages Packages: - rmarkdown - bitops - caTools - knitr - tidyverse - shiny - logrrr # any repositories, order matters Repos: - MPN: "https://mpn.metworx.com/snapshots/stable/2020-12-21" # path to install packages to Library: "<path/to/install/library>" # package specific customizations Customizations: Packages: - shiny: Suggests: true
When you run
pkgr install with this as your pkgr.yml file, pkgr will download and
install the packages listed in the Packages array,
and any dependencies that those packages require.
If you want to see everything that pkgr is going to install before actually installing, simply run
pkgr plan and take a look.
How about a more complex example? One such situation is the need to install from multiple repositories.
Here is a configuration that also pulls from bioconductor, which contains multiple CRAN-like repos that contain packages:
Version: 1 # top level packages Packages: - magrittr - rlang - ggplot2 - dplyr - tidyr - plotly - VennDiagram - aws.s3 - data.table - forcats - preprocessCore - loomR - ggthemes - reshape # any repositories, order matters Repos: - MPN: "https://mpn.metworx.com/snapshots/stable/2020-12-21" - BioCsoft: "https://bioconductor.org/packages/3.12/bioc" - BioCann: "https://bioconductor.org/packages/3.12/data/annotation" - BioCexp: "https://bioconductor.org/packages/3.12/data/experiment" - BioCworkflows: "https://bioconductor.org/packages/3.12/workflows" # path to install packages to Library: pkgs Cache: pkgcache Logging: all: pkgr-log.log install: install-only-log.log overwrite: true
The default behavior of pkgr is to find the first repository that contains the given package and use that. You
can use Customizations to control that behavior at the
For example, given the following, though dplyr is available in both repositories, thus would default to MPN, by setting the Repo in the package customization it will force dplyr to be installed from CRAN.
Version: 1 # top level packages Packages: - dplyr - ggplot2 Repos: - MPN: "https://mpn.metworx.com/snapshots/stable/2020-12-21" - CRAN: "https://cran.rstudio.com" Library: "test-library" Customizations: Packages: - dplyr: Repo: CRAN
You can confirm this behavior by inspecting the debug output of the plan:
pkgr plan --loglevel=debug INFO Installation would launch 16 workers INFO R Version 3.6.3 INFO OS Platform x86_64-apple-darwin15.6.0 INFO Package Library will be created path=test-library INFO Default package installation type: binary INFO 1072:1073 (binary:source) packages available in for MPN from https://mpn.metworx.com/snapshots/stable/2020-12-21 INFO 16593:16772 (binary:source) packages available in for CRAN from https://cran.rstudio.com INFO Package installation cache directory: /Users/devinp/Library/Caches/pkgr INFO Database cache directory: /Users/devinp/Library/Caches/pkgr/r_packagedb_caches DEBU package repository set pkg=dplyr relationship="user package" repo=CRAN type=binary version=1.0.2 DEBU package repository set pkg=ggplot2 relationship="user package" repo=MPN type=binary version=3.3.2 DEBU package repository set pkg=labeling relationship=dependency repo=MPN type=binary version=0.4.2 DEBU package repository set pkg=rematch2 relationship=dependency repo=MPN type=binary version=2.1.2 DEBU package repository set pkg=isoband relationship=dependency repo=MPN type=binary version=0.2.3 DEBU package repository set pkg=lifecycle relationship=dependency repo=MPN type=binary version=0.2.0 DEBU package repository set pkg=mgcv relationship=dependency repo=MPN type=binary version=1.8-33 .... TRUNCATED FOR WEBSITE INFO package installation status installed=0 not_from_pkgr=0 outdated=0 total_packages_required=53 INFO package installation sources CRAN=1 MPN=52 tarballs=0 INFO package installation plan to_install=53 to_update=0 INFO Library path to install packages: test-library INFO resolution time 223.09698ms
Notice on the dplyr pkg, the repo was CRAN instead of all others on MPN.
Once a package has been installed, pkgr will not touch that package unless you explicitly request it using the
--update flag. Therefore,
if you change your configuration after already installing a package (for example changing the repository), even if it
detects a different version under the new plan, it will not override it unless
--update is passed.
pkgr install --update
Be careful around leveraging this "feature" to manually build up a combination of package versions. It is much better to be explicit around your intent - namely by adjusting the Customizations to reflect the environment you want to maintain so others can reproduce your environment.
There are many other controls for pkgr, which can be seen in the pkgr User Manual.
installing stand-alone packages
pkgr can also install single packages that are not attached to a repository. This can be a convenient feature
when you have a package internal for your use or your company that is not hosted anywhere. This way
you can include it to be installed.
pkgr will also automatically reconcile the dependencies needed
to install it.
Tarballs: - path/to/pkg.tar.gz
pkgr and packrat and renv and pak
how does it compare with pak can be read about here
Pkgr is not a replacement for Packrat/renv -- Pkgr is complementary to packrat/renv.
packrat/renv are tools to capture the state
of your R environment and isolate it from outside modification.
Where Packrat often falls short, however, is in the restoration said environment.
Running packrat::restore() restores packages in an iterative fashion, which is a
time-consuming process that doesn't always play nice with packages hosted outside
of CRAN (such as packages hosted on GitHub). Additionally, since renv uses
under the hood, each call to
install.packages is still treated as an isolated procedure rather than as a part of
a holistic effort. This means that the installation process does not stop and inform
the user when a package fails to install properly. In this situation, renv/pkgr continues to install
what packages it can without regard for how this might affect the package ecosystem when those
individual installation failures are later resolved.
Pkgr solves these issues by:
- Installing packages quickly in parallelized graph (determined by the dependency tree)
- Allowing users to control things like what repo a given package is retrieved from and what Makevars it is built with
- Showing users a holistic view of their R Environment (
pkgr inspect --deps --tree) and how that environment would be changed on another install (
- Providing timely error messages and halting the installation process immediately when something goes wrong during the installation process (such as a package not being available, a repository being unreachable, etc.)