13 Workflows
Workflows are scripts — mostly Quarto notebooks — that ingest source datasets, merge taxonomic authorities, compute scores, and build the derived artifacts consumed by the serving tier. They are the only writers of sdm.duckdb; everything downstream (Shiny apps, the TiTiler factory, the plumber API) reads it in read-only mode.
Links:
- marinesensitivity.org/workflows — rendered HTML
- github.com/MarineSensitivity/workflows — source
See Figure 10.5 for a visual summary and Chapter 12 for the target schema.
13.1 Ingest workflows (populate sdm.duckdb)
Each source dataset has a dedicated ingest_*.qmd that writes rows to cell_metric, model_cell, taxon, dataset and related tables:
| Workflow | Source |
|---|---|
ingest_aquamaps_to_sdm_duckdb.qmd |
AquaMaps predictions (∼16,800 models) |
ingest_sdm-nc.qmd, ingest_sdm-gm.qmd |
NOAA/GEBCO cetacean + sea-turtle distribution models |
ingest_taxon.qmd |
Unified taxonomy from WoRMS, eBird |
ingest_nmfs-fws-listings.qmd, ingest_iucn*.qmd |
Conservation status (ESA listings, IUCN Red List) |
ingest_prot.qmd, ingest_blocks.qmd, ingest_mregions.qmd |
Spatial zones (protractions, blocks, marine regions) |
merge_models.qmd |
Authority merging — unified taxon table, taxon_model junction, merged mdl_seq per species |
13.2 Scoring workflow (cell_metric + zone_metric)
calc_scores.qmd— the primary scoring pipeline. Loadssdm.duckdbread-write, computes extinction-risk metrics by species category (e.g.,extrisk_bird,extrisk_fish,extrisk_mammal), writes rows tocell_metric, and aggregates intozone_metricfor both Program Area and subregion zones. Also exports helper caches (taxon.csv,layers_v6.csv,flower_default_subregions.csv) used by the Shiny apps at startup.update_scores.qmd— targeted remap when the score scale changes (e.g., the 2026 migration from the old 70/90 critical-habitat scale to the unifieder_scoreEN=100 / TN=50 / LC=1 scale).
13.3 Derived artifacts for the serving tier
Two one-shot build steps produce the binary artifacts that sit next to sdm.duckdb on /share:
server/titiler/scripts/make_cellid_cog.py— one-shot script that reads band 1 of the source multi-band raster (r_bio-oracle_planarea.tif, stored in 0–360° longitude convention for contiguous US EEZ coverage across the Pacific), casts NaN → 0 + float32 → uint32, wraps longitudes to standard −180..180°, and writes a single-band tiled GeoTIFF at/share/data/derived/r_cellid.tif. No overviews — nearest-neighbor reads at native resolution are fast enough for z ≤ 4 and interpolating overviews would corrupt integer cell ids.- PMTiles builds — offline
tippecanoeruns over the source GeoPackages (ply_programareas_2026,ply_ecoregions_2025,ply_planareas_2025, …) produce.pmtilesarchives, published to/share/pmtiles/and served by Caddy atfile.marinesensitivity.org/pmtiles/<layer>.pmtiles.
Re-run either when the underlying data changes; the Shiny app’s mtime cache-busting parameter (tied to sdm.duckdb’s mtime) invalidates cached tiles automatically when the DB is rebuilt.