Working in Metworx


Intro to /data

On your workflow, there is an attached disk called /data

/data is a local disk attached to just the master node

/data is mounted to the compute nodes

/data is where your home directory is. This means anything in your home directory, or anything in /data is available everywhere on your workflow (both master and compute nodes).

It is recommended that work is always done somewhere under /data

RStudio Overview

If you launch RStudio within your workflow, the Home directory you see upon launching is within /data (/data/home/<yourname>). Upon launching your workflow, if you type pwd in the terminal, you can confirm the directory structure.

rstudio home directory

Using RStudio as a primary entry point for project work

We recommend using RStudio Server not just for R use, but really as the entry point for anything that is not solely available via the graphical software on the Guacamole desktop. Particularly good for:

  • Editing text files (like files in other languages, aka Python, or control streams for NONMEM)

This will be your highest performance avenue for those interactions.

RStudio Terminal

Terminal available in RStudio Server is a full Linux terminal. You can also make multiple terminals, so if one terminal is busy, you can create another terminal and swap between the two as needed.

RStudio Sessions

You can also make multiple sessions, which is most relevant in two scenarios:

If you're trying to interact with multiple projects at the same time. You can switch to different sessions using different projects.

You can also have multiple sessions within the same project. This is useful because it allows you to have multiple RStudio windows open at the same time.

So, if you have multiple monitors available to you, this is how you can spread your RStudio work within your Metworx Workflow across those multiple monitors.

Question: after wrapping up an RStudio session and shutting down the associated workflow for the day, if the next day I start up a new workflow, I notice that when I load RStudio, whatever state the session was in on the previous day persists to the next day. Does the RStudio session require a workflow to run? If not, then why is the RStudio session state saved?

The RStudio session does require a workflow to run.

The RStudio session is stored on the user's disk. Because your disk is in your home directory, when a workflow shuts down, it automatically backs up everything under /data. The disk gets snapshotted and backed up so the next time you go and attach that pre-existing disk to a new workflow it will reattach and remount the disk

  • If you do not like the default behavior and you would rather not start with a new session each time you spin up a new workflow (we recommend a new session each time so you do not accidentally introduce some sort of state to your analysis activities or otherwise), you can:
  • Go to global options
  • In the "Basic" section of the General R options, and change the configuration to your preference

rstudio no restore

  • In the "Advanced" section of the General R options, make sure "Show server home page" is set to "Always".

rstudio landing page config

When doing this, you will be taken to the RStudio Server home page when you launch RStudio in a new workflow, and you will be able to see your previous sessions (although those sessions will be cleared out due to the changes made in the "Basic" section above and you can choose to either load those old sessions or start a new session to begin working again.

If you click "Quit", that will delete those sold sessions from your home page, removing any unwanted clutter. Your project will remain when doing this, it just deletes any remnants of previous sessions.

rstudio landing page

Remember, if you choose to set up RStudio so sessions are not saved, the files in your disk will persist across workflows, just not the specific state associated with your terminal sessions.

Uploading and Downloading Data

In cases where you need to do single file uploads or downloads, it is quite easy to do so in the RStudio interface.

To upload a file from your laptop to your workflow, click "Upload" and select the file you wish to upload, along with the proper location.

To download a file from your workflow to your laptop, select the file or file or folder within the file viewer, then go to "More" and click "Export" then you can download the file to your machine.

rstudio data download

Recommend to be cautious when using this functionality though, because there would be no audit/backup. If you need quick access to a file to quickly add to a report, that is probably the best use case. For downloading and storing data/files that you want to have some sort of a backup/traceability, it is best that those files get stored to a shared location, like s3.

  • Through RStudio there is a couple Gigabyte limit per transfer.
  • s3 is the durable way to move data back and forth between workflow and laptops because it has backups, file trails, etc.)