Rightsizing Workflows


In newer versions of metworx, many CPU and RAM combinations are available for master (aka head) and worker (aka compute) nodes.

new-metworx

As a simple rule-of-thumb - the following guidance can be followed:

If you do not need to do any computational expensive activities on the master node (for example, just data assembly, postprocessing results, and submitting jobs to the grid), 2 vCPU's should be computational sufficient.

Note: One consideration is the installation of packages at project kickoff. Given pkgr can use multiple cores, one suggestion is to use a 4 or 8 core workflow when you know you will need to do a large installation of packages (such as a new project, or QC)

If you are doing heavy processing, to make sure the computational thread is completely clear as best as possible from other system activities, a 4 vCPU workflow may be selected. In general, if you do not have large simulations or many datasets that are 100's of megabytes or gigabytes, 16 GB of RAM should be sufficient.

There are some options available for you to monitor your workflows and see if they are configured appropriately:

Starting with the 21.08 Metworx series, Grafana dashboards are available, allowing you to see key metrics related to the master node and the grid. To learn more about the Grafana dashboards, feel free to read the corresponding KB article linked here.

Grafana Dashboard

Additionally, you can check the RStudio Admin dashboard available at https://<YOURWORKFLOW-ID>.metworx.com/rstudio/admin. This dashboard shows historical RAM and CPU usage to identify if you may need additional RAM or CPU.

If you are on a 21.08 series workflow (or newer), it's recommended that you use the Grafana dashboards mentioned above. However, if you are on a 20.12 series workflow or older, the RStudio Admin dashboard is still a very useful tool.

admin-panel

Video tutorial