12 Database
The source of truth for the Marine Sensitivity Toolkit is a single DuckDB file, sdm.duckdb, stored on the server at /share/data/big/latest/sdm.duckdb (a symlink to the current versioned directory, currently v6). DuckDB is a columnar, embedded analytic database — it’s fast enough to power live cell_metric queries from the TiTiler factory while also being a single-file artifact that can be copied, versioned and served read-only to multiple processes concurrently.
All species distributions (merged from AquaMaps, BirdLife, IUCN and others), per-cell scores (cell_metric) and per-zone aggregates (zone_metric) live here. The Shiny apps, the custom plumber API and the TiTiler tile factory all read the same file in read-only mode; workflows under workflows/ are the only writers.
Why DuckDB and not PostgreSQL? Earlier versions of the stack used PostgreSQL + PostGIS as the primary store. As the dataset stabilized around a fixed 0.05° cell grid and a small number of zone layers, we no longer needed live geometry operations or concurrent writes — just fast analytic reads. DuckDB gives us columnar scans over tens of millions of cell_metric rows in well under a second, without a separate database server process. PostgreSQL is retained in docker-compose.yml only for a handful of legacy apps (see Chapter 11).
12.1 Table and column naming conventions
- Table names are plural and use lower case.
- Unique identifier columns are suffixed with:
*_idfor unique integer keys (stable across rebuilds);*_keyfor unique string keys (human-readable);*_seqfor auto-incrementing sequence integer keys (internal join keys).
- Column names are singular and use
snake_case. - Foreign keys use the singular form of the referenced table, followed by the primary-key suffix (
_id,_key, or_seq). - Primary keys are named
idwhen a table has a single natural key.
12.2 Core tables
The stack revolves around ∼10 tables; the ones touched by the tile factory and the Shiny apps are:
| Table | Rows (v6) | Purpose | Read by |
|---|---|---|---|
cell |
∼660 k | 0.05°-resolution grid cells covering the US EEZ footprint. | workflows, apps |
cell_metric |
∼10 M | (cell_id, metric_seq, value) — the per-cell score for each metric. |
TiTiler factory, apps |
metric |
∼20 | Metric metadata (metric_key, metric_seq, description). |
factory, apps |
zone |
∼100 | Zone registry (zone_seq, tbl, fld, value) — keyed by the source GeoPackage table, field and value, so Program Areas, Ecoregions, Subregions, Planning Areas all share one schema. |
factory (mask SQL), apps |
zone_cell |
∼3 M | (zone_seq, cell_id) — membership mapping, used for subregion / Program Area filters. |
factory, apps |
zone_metric |
∼2 k | Pre-aggregated (zone_seq, metric_seq, score) — the per-Program-Area / per-subregion score values that drive the choropleth. |
apps |
taxon |
∼17 k | Unified taxonomy (one row per species) merged from WoRMS, eBird, IUCN. | apps |
taxon_model |
∼20 k | (taxon_id, ds_key, mdl_seq) — which species is represented by which distribution model(s). |
apps |
model_cell |
∼30 M | (mdl_seq, cell_id, value) — raw species-distribution-model output per cell (0–100 suitability or similar). |
apps (species explorer) |
dataset |
∼10 | Source datasets (AquaMaps, BirdLife, NMFS CH, FWS CH, IUCN, …). | apps |
12.2.1 How the core tables feed the serving tier
- Raster cell tiles (Figure 10.2) — the TiTiler factory receives a base64-encoded SELECT that joins
cell_metricwithmetric(and optionallyzone/zone_cellwhen the user picks a subregion), returning(cell_id, value)rows. The factory LRU-caches this result per-SQL and looks eachcell_idup against the pre-baked cell-id COG to colorize the tile. - “Cells outside Program Areas” mask — a specific SQL returning every
cell_metriccell_idthat is not in any row ofzone_cellwherezone.fld = 'programarea_key'. Rendered with the factory’scolor=single-color mask path. - Program Area choropleth (Figure 10.4) — Shiny queries
zone_metricjoined tozone(filtered tofld = 'programarea_key'), maps score → color, and pushes amatch_exprpaint rule tomapbox-gl. Geometry comes from static PMTiles; only the paint spec changes on metric switch.
The entity-relationship diagram (ERD) and detailed schema are built by the create_sdm-tables.qmd workflow:
And ingestion from a specific source is illustrated in:
See Chapter 13 for the full list of ingest / score workflows.