Dev containers with R and Quarto

development
rstats
quarto
Dev containers are probably the quickest way to get other folks into your data science code.
Published

April 11, 2023

My favourite new trick is setting up a reproducible R/Quarto development environment in two clicks. It looks like this, and creating it yourself is as simple as following a link:

Basic 2-core Codespaces like the one here are free to use for 60 hours a month for everyone with a GitHub account (as of March 2023)—enough to do 3 hours of work every workday.

You won’t be billed you unless you add a billing method and raise the spending limit, and unused Codespaces stop running after half an hour if you forget to stop them yourself. So give it a whirl!

A sample dev container running as a Codespace in the browser.

A sample dev container running as a Codespace in the browser.

We publish a lot of open data analyses at 360info. We want to make visuals quickly and easily accessible to journalists, but we also want the underlying data, and any analysis we’ve done to it, reproducible too. I call it a ‘cake and ingredients’ model—we want to deliver both.

So we want people to be able to dive into our analysis quickly, rather than floundering installing R packages or GDAL or something.

Development containers, or dev containers, don’t just control dependencies like R package versions—they make your analysis shareable, too! They’re real easy to set up and share, even if you haven’t messed around with Docker before.

Wait, what’s a dev container again?

When you boot a project up in a dev container, your project folder on your laptop (the host) is mounted inside a Docker container—a sandboxed Linux environment with tools that you specify. The container might be as simple as just a fresh copy of Linux, or it might have R and a bunch of packages you specify installed.

Dev containers build on the strengths of Docker containers by making them the place you do your development work. When you use a dev container, VSCode brings your usual extensions, themes and keyboard shortcuts along for the ride—even magical extensions like Live Share.

The dev container’s specification—the thing that tells it which tools, languages and packages to include—stays with the project, so everyone using the project gets the same environment (no matter how they have VSCode itself set up). They can recreate it on their laptop if they have Docker installed
 or they can run it in the cloud with GitHub Codespaces.

Okay, so how do I set one up?

If you want to run dev containers on your own laptop, the one thing you’ll need to install (apart from VSCode itself) is Docker. Luckily, you don’t need Docker experience to use dev containers—VSCode handles the tricky bits.

The magic happens in the .devcontainer/devcontainer.json file. Open a folder in VSCode that has a .devcontainer folder with devcontainer.json in it, and VSCode will prompt you to reopen the project in a dev container.

If you’re starting from scratch and need to work out which tools you want, VSCode has a wizard available from the command palette that can write this file for you. You choose the language you primarily want to work in, add extra features, and off you go!

VSCode has a wizard that will take you through the steps of setting up a devcontainer.json, but you can steal one from another project if you’d like.

Or you can write it yourself. Here’s the devcontainer.json I usually start with on R/Quarto projects:

.devcontainer/devcontainer.json
{
    "name": "R (rocker/r-ver base)",
    "image": "ghcr.io/rocker-org/devcontainer/r-ver:4.2",
    "features": {
        "ghcr.io/rocker-org/devcontainer-features/quarto-cli:1": {
            "version": "prerelease"
        },
        "ghcr.io/rocker-org/devcontainer-features/apt-packages:1": {
            "packages": "libudunits2-dev,libxtst6,libxt6,libmagick++-dev"
        },
        "ghcr.io/rocker-org/devcontainer-features/r-packages:1": {
            "packages": "github::rstudio/renv,tidyverse,here,httpgd"
        },
    },
    "customizations": {
        "vscode": {
            "extensions": ["mechatroner.rainbow-csv"]
        },
        "codespaces": {
            "openFiles": ["README.md"]
        }
    }
}

Let’s break this spec down:

  • We’re going to use an R image published by the community. It uses the latest R 4.2.x releases.

  • We can add extra features to the image:

    1. The pre-release version of Quarto at the time gets installed (you can also specify a fixed version), and the Quarto VSCode extension is added.

    2. Four packages are installed from APT, the Ubuntu package manager.

    3. A few R packages are installed. Most are from the Posit Package Manager (PPM), which acts as a CRAN mirror, but renv is installed from GitHub here.

  • Finally, we add a few editor-specific customizations:

    1. In addition to the R and Quarto VSCode extensions, which come with the image and features, I want to make sure that the Rainbow CSV extension is installed. It’s a nice, lightweight colouriser for CSV files.

    2. When someone is launching this project as a Codespace (more on that below!), we can tell the container to open a set of files by default. This is a great way to welcome users by automatically opening the README or an intended entrypoint to the project.

Once you’re ready to launch your dev container, press Shift-Ctrl-P to open the command palette, and select Dev Containers: Reopen in Container. Hold on!

Once you’re working inside the container, you can write code, save files, commit and push as usual. For the most part, VSCode brings your usual git credentials into the container.

I use 1Password’s SSH agent, and I haven’t quite figured out how to make that work with dev containers yet. It feels like a solvable problem, though.

You can close the container when you’re done, and it’ll open right back up next time—or if you delete the container, it can be recreated from the container spec.

You can install tools and packages from the terminal once you’re inside the container if you like, but keep in mind that the changes won’t persist if you delete the container (or if someone else is trying to use it).

It’s generally best to specify tools you want in the container in devcontainer.json.

But how do I share it with others?

If you commit devcontainer.json to version control, other people who clone your repo and open it in VSCode will be able to launch it in a dev container, making it easy for them to match your environment. But that’s not the best part.

Use the Code button on a GitHub repository to launch a codespace, or to get a link that you can share with others to launch one.

GitHub has Codespaces, a way to launch a repository in a cloud environment running VSCode in your browser. (If you’ve ever used Posit Cloud, this is very much the same deal!)

If you open a Codespace on a repository that includes a devcontainer spec, the Codespace will use that spec. Codespaces support most of the things that local VSCode does (including some of the really cool things, like sharing an open web server with other people).

You can also share a link to open a specific repo in Codespaces—a great way to get students and workshop participants started when you don’t want to bother with installation pre-requisites. As long as they have a GitHub account, they can get started, and there’re plenty of free hours each month for most use cases.

I’ve updated this section, having a discovered an easier and clearer way to share links that start Codespaces!

If you use the “Share deep link” button on a repo, you can create a link that others can use to launch a Codespace from a repo (provided they have access to it). Or you can type it straight out using the repo owner and name:

https://codespaces.new/jimjam-slam/quarto-devcontainer-demo

You can also add ?quickstart=1 to the end if you’d prefer not to give users confusing additional options like the region or the amount of computing power. You can also use the “Share deep link” button to get HTML or Markdown code for a nice badge to put on your README or blog post!

Open in GitHub Codespaces

Open in GitHub Codespaces

Now anyone with a GitHub account can click it to start a Codespace with your repo!

Other people who launch your project as a Codespace use their own billing, not yours!

Is it really reproducible?

Nothing’s completely reproducible, and dev containers certainly aren’t foolproof. A few things to keep in mind:

  • If you use Rocker’s apt-packages feature to install other tools, these track whatever the latest versions are on APT, as far as I’m aware. If you need fixed versions of things, either:

  • The version of R is determined by the version of the image you select. If you select latest or the latest R minor version (currently 4.2), it will track the latest patch version (eg. 4.2.3) until a new minor release comes out. The Rocker dev containers don’t have labels for specific patch versions.

  • The versions of R packages that come from the PPM are also determined by the image you select. If you select latest or the latest R minor version (currently 4.2), the container will use the latest available versions (even if those versions have changed since you last built the container). Once a new minor version releases, previous ones will switch to using a fixed date checkpoint on the PPM.

    • If you need to make sure that R package versions do not change from the start of your analysis, you can set a specific Posit Package Manager checkpoint even for the latest R image using:

      "containerEnv": {
      "PKG_CRAN_MIRROR": "https://packagemanager.posit.co/cran/2023-01-31"
      },

For reproducibility reasons, I generally I find it easier to start a project working in a dev container, rather than converting an existing project. You can use renv to restore from an renv.lock file, or even share an renv cache across several dev conainter projects, but I’ve found all that a little difficult in practice.

On the other hand, if you start in a dev container from the beginning and set a fixed PPM checkpoint, you know that your R dependencies won’t change.

You can also use Miles McBain’s helpful {capsule} package to quickly write out a lockfile. Other folks can use that to install R packages even if they’re not sold on using a dev container.

Why now?

Dev containers have been around for a year or two, but I’ve leapt on them now because the R community are now producing cross-platform images that include Quarto support.

That means it doesn’t matter whether you’re in a Codespace, on a Windows laptop or using an Apple Silicon Macbook: the code should run just the same regardless.

I also don’t exclusively code in R. Quarto’s VSCode support is first-class, whether you’re coding in R, Python, Julia or Observable JS—and I can add any number of other languages with the confidence that VSCode will handle them the same way it does locally.

Dev containers have a whole bunch of exciting applications. Here’re just a few:

  • Workshops, livestreams and reproducible live talk demos
  • Demos for analysis packages
  • Reproducible open source journalism

Give ’em a try!