6 Model Merging
Each species may have distribution data in multiple source datasets (see Chapter 3). The model merging pipeline combines these into a single merged model per species, applying spatial masking and minimum floor values to reflect regulatory protections.
6.1 Pipeline Overview
The merge pipeline processes each of the 9,819 valid species through the following steps:
6.1.1 Step 1: Gather Source Models
For each species (identified by taxa_id), the pipeline queries all available models across the 7 source datasets. A species may have:
- an AquaMaps SDM (continuous suitability)
- one or more regulatory range/habitat designations (NMFS, FWS)
- a BirdLife or IUCN range map
6.1.2 Step 2: MAX Across Datasets
For each grid cell, the merged value is the maximum suitability across all source datasets:
\[ v_{merged,c} = \max(v_{am,c},\; v_{ca,c},\; v_{ch,c},\; v_{rng,c},\; v_{bl,c},\; v_{iucn,c}) \tag{6.1}\]
This ensures that the most informative (highest confidence) prediction is used. For example, if AquaMaps predicts 30% suitability but NMFS has designated the cell as critical habitat for an endangered species (100%), the merged value is 100%.
6.1.3 Step 3: Spatial Masking
When an IUCN range map, NMFS Critical Habitat, or FWS Critical Habitat exists for a species, the merged model is constrained to the spatial extent of these mask datasets. The mask is formed as the union of all available is_mask = TRUE datasets for that species.
This prevents the AquaMaps SDM (which often has broad environmental envelope predictions) from extending species presence far beyond their known range. Cells outside the mask are set to zero (species absent). This ensures that suitable habitat is not included outside known ranges, so that model predictions align with expert knowledge of where species actually occur.
6.1.4 Step 4: MMPA Spatial Floor
For species protected under the Marine Mammal Protection Act (all marine mammals), a spatial minimum floor is applied:
\[ v_{c} = \max(v_{merged,c},\; 20) \tag{6.2}\]
This ensures that every cell where a marine mammal is present has a minimum value of 20%, reflecting the legal protection that MMPA affords regardless of ESA status.
6.1.5 Step 5: MBTA Spatial Floor
Similarly, for species protected under the Migratory Bird Treaty Act (most seabirds), a spatial minimum floor of 10% is applied:
\[ v_{c} = \max(v_{merged,c},\; 10) \tag{6.3}\]
6.1.6 Step 6: Persist Results
The final merged model is stored in the DuckDB database as:
modeltable: one row per species with metadata (dataset keyms_merge, taxa reference, model sequence)model_celltable: one row per species × cell combination with the merged suitability value
6.2 Valid Species Filter
After merging, species are flagged as valid (is_ok = TRUE) based on the criteria described in Chapter 4. The filtering rules differ slightly between birds and other taxa:
Birds (from BirdLife BOTW):
- has a
botw_id(BirdLife identifier) - IUCN Red List code is not “EX” (Extinct)
- if also in WoRMS: must be marine and not extinct
- has cells overlapping at least one BOEM Program Area
Other marine taxa (from WoRMS):
- has a
taxa_idand a merged model (mdl_seq) - IUCN Red List code is not “EX”
- WoRMS
isMarine = TRUEandisExtinct != TRUE - species category is not “reptile” (except sea turtles, which are reclassified as category “reptile”)
- has cells overlapping at least one BOEM Program Area
This filtering yields 9,819 valid species that contribute to the final sensitivity scores.
