10  Software

10.1 Software Overview

The Marine Sensitivity Toolkit (MST) is a stack of software components for reproducibly, interactively and hierarchically generating environmental vulnerability maps and scores. Tabular data is collated across studies evaluating sensitivity of species to oil & gas and offshore wind energy development. The best available species and benthic habitat distributions are being mosaicked across the US EEZ. Scores are summarized according to subgroups within Species, Benthic Habitats and Primary Productivity, averaged into an overall score, and visualized as a flower plot, applicable to various levels of BOEM relevancy: regional, ecoregions, protraction diagrams, blocks and aliquots.

The software components for achieving this interactively are:

  1. server — a Docker Compose configuration orchestrating the reverse proxy, Shiny server, APIs, custom TiTiler tile factory + Varnish cache, and file server;
  2. database — a columnar, embedded analytic database (DuckDB) as the single source of truth for species distributions, per-cell scores and derived per-zone metrics;
  3. workflows — Quarto notebooks to ingest source datasets (AquaMaps, IUCN, NMFS/FWS, BirdLife, WoRMS, BOEM zones), merge taxonomic authorities, and compute scores that are written back to DuckDB;
  4. APIs — a custom R plumber API for data requests and a custom TiTiler factory that renders on-the-fly raster PNG tiles from arbitrary DuckDB queries, cached by Varnish;
  5. libraries — re-usable documented functions as the msens R package, including the tile-URL helpers (cell_tile_url(), cell_stats(), add_cell_tiles()) that the Shiny apps use to talk to the factory;
  6. applications — interactive Shiny apps (scores, species, …) that combine the raster tile factory with static PMTiles vector layers and live-pushed choropleth paint rules;
  7. documentation — this Quarto book with figures, tables, glossary and references in interactive HTML or static DOCX / PDF output.

This toolbox is intended to primarily serve BOEM needs internally, but by being open-source and fully reproducible the hope is to enlist buy-in and even contributions from external partners, whether from other government agencies, academia, NGOs or industry.

We ascribe to the philosophy of sharing all code for the sake of reproducibility, transparency and efficiency (Maitner et al. 2024; Lowndes et al. 2017); i.e. the FAIR principles of Findability, Accessibility, Interoperability, and Reusability (Wilkinson et al. 2016).

Figure 10.1: System architecture: DuckDB is the authoritative store for cell-level species/score data. A custom TiTiler factory renders those cell values on-the-fly into PNG tiles (cached by Varnish); Program Area / Ecoregion polygons are served as static PMTiles through Caddy; the Shiny apps combine both layer types and push per-Program-Area scores live from DuckDB into the polygon paint spec.

10.1.1 Interactive Applications

We have developed a series of interactive applications to explore the data and results of the MST project. These applications allow users to visualize the data, explore the results, and interact with the data in a more intuitive way. The applications are built using the shiny package in R (Chang et al. 2024), which allows us to easily create a user interface with complex reactivity for an interactive web application easily accessed through a web browser. The applications are designed to be user-friendly and intuitive, with interactive maps, charts, and tables that allow users to explore the data in a more dynamic way.

10.1.2 Overcoming Challenges with Large Spatial Data

The MST project incorporates many large spatial datasets that are problematic to render in a typical interactive application. For instance, the most common interactive mapping R package leaflet has a 4 MB limitation for displaying rasters (see “Large Raster Warning” in Raster Images • leaflet). Vectors (points, lines and polygons) get smoothed when containing many vertices, but contiguity between polygons is lost and rendering degrades depending on the user’s connection speed.

To work around these limitations we use a “cloud native” stack (see also the Cloud-Optimized Geospatial Formats Guide) that transfers only the pixels / features visible in the current viewport:

  • Raster cell values are rendered on-the-fly from DuckDB by a custom TiTiler factory, cached by Varnish, and delivered as small 256×256 PNG tiles.
  • Vector polygons (Program Areas, Ecoregions, Planning Areas, …) are served as static PMTiles archives by the Caddy file server — a single archive per layer, range-read by the browser.
  • Choropleth values (per-Program-Area scores) are pushed live from DuckDB into the PMTiles paint spec via a mapbox-gl match expression, so geometry stays static but color updates as the user changes metric / subregion.

Let’s take a closer look at each.

10.1.2.1 Raster: DuckDB cell values rendered via a custom TiTiler factory

Each metric (e.g., extrisk_mammal) resolves, in DuckDB’s cell_metric table, to a list of (cell_id, value) rows — one row per 0.05°-resolution grid cell in the US EEZ. To show these in the browser we would historically have to materialize the full raster (∼4 GiB for the whole EEZ) and ship it over the wire.

Instead we keep the values in DuckDB and extract only the tiles the user’s browser is asking for. The Shiny app builds a small SELECT, base64-encodes it and embeds it in the tile URL template. On tile request, our custom TiTiler factory:

  1. validates the SQL (SELECT-only, via sqlglot) and executes it once per unique query, LRU-caching the resulting cell_id → value map in-process;
  2. reads the tile’s window of integer cell ids from a pre-baked single-band cell-id COG (r_cellid.tif) via nearest-neighbor resampling, so integer ids are never interpolated;
  3. looks the value up for every pixel, applies rescale + colormap, and returns a PNG.

Varnish sits in front of the factory and caches each tile URL for 7 days, so once a tile has been rendered for a given SQL + colormap + rescale combination, every subsequent viewer is served from cache in sub-millisecond time (Figure 10.2). A mtime query parameter tied to the source DuckDB’s modification time automatically busts the cache when the DB is rebuilt.

Figure 10.2: Sequence diagram: a cell-values raster layer is rendered on-the-fly by the custom msens TiTiler factory from the DuckDB source of truth. Varnish caches identical tile URLs for 7 days.

The same factory also supports a single-color “mask” render — the “Cells outside Program Areas” overlay in mapgl is built this way: a SQL that returns all cells with metric data not in any Program Area zone, rendered with color=#222222 (no colormap, no rescale) as a semi-transparent dark overlay.

10.1.2.2 Vector: static PMTiles served by Caddy

For polygon layers (Program Areas, Ecoregions, Planning Areas, protractions, blocks, aliquots) we build PMTiles archives offline via tippecanoe and publish them as static files on the server. A PMTiles archive is a single file with an internal directory structure that lets any HTTP client issue small range requests for individual tiles; no tile server process is required — Caddy’s static file handler is all we need (Figure 10.3).

Figure 10.3: Sequence diagram: vector layers are served as static PMTiles archives through Caddy. Each tile is a byte-range read of the archive — no tile server process is in the path.

This is a deliberate simplification from the prior architecture which used PostgreSQL + PostGIS + pg_tileserv. For our workload — polygon layers that change only when the data is rebuilt — a compiled tile server and live database query engine add latency and operational complexity for no benefit over byte-range reads of a pre-baked archive. (The PostgreSQL service remains in docker-compose.yml for compatibility with a few legacy apps; see Chapter 11.)

10.1.2.3 Choropleth: live DuckDB values + static PMTiles geometry

The Program Area choropleth is the interesting hybrid case. The geometry (Program Area polygons keyed by programarea_key) is static — it comes from the same PMTiles file as the polygon outlines. But the fill color depends on the currently-selected metric + subregion, which means the per-Program-Area score has to come live from DuckDB.

The Shiny server handles this by querying zone_metric (pre-aggregated per-zone values), building a mapbox-gl match expression that maps programarea_key → fill_color, and pushing that paint rule to the map through mapgl’s Shiny proxy. No tiles are re-rendered; only the paint specification changes (Figure 10.4).

Figure 10.4: Sequence diagram: the Program Area choropleth. Polygon geometry is static (PMTiles from Caddy file server); per-area fill values come live from DuckDB’s zone_metric table and are pushed into mapbox-gl’s paint spec via a match expression. Switching metrics updates only the paint rule, not the tiles.

10.1.2.4 Data pipeline — how the source of truth is built

The three serving paths above all sit downstream of the same data pipeline: ingest workflows populate DuckDB from ∼7 source datasets (AquaMaps, IUCN, NMFS/FWS, BirdLife, WoRMS/eBird taxa, BOEM zones, …); calc_scores.qmd computes per-cell and per-zone metrics back into DuckDB; and a one-shot make_cellid_cog.py bakes the cell-id COG consumed by the TiTiler factory. PMTiles archives are built offline from the same zone geopackages used to populate the zone table (Figure 10.5).

Figure 10.5: Data pipeline: ingest workflows populate DuckDB; scoring workflows compute cell_metric and zone_metric back into DuckDB; a one-shot COG regen and offline PMTiles build produce the derived artifacts the serving tier reads.

10.1.3 Github Repositories

repo description
.github organization README
api application programming interface (API) using R Plumber package
apps Shiny applications
docs documentation for BOEM’s offshore environmental sensitivity index products
manuscripts Manuscripts with review of sensitivities by industry and receptors (species, habitats, human uses)
MarineSensitivity.github.io default website
msens R library of functions for mapping marine sensitivities, sponsored by BOEM
objectives repository for issues spanning multiple repositories and doing big picture roadmapping
server server setup for R Shiny apps, RStudio IDE, R Plumber API, PostGIS database, pg_tileserv
workflows scripts for testing data analytics and visualization as well as production workflows

10.1.4 Software Components