4 Taxonomy
Integrating species distribution data from 7 source datasets requires resolving taxonomic identities across multiple naming systems. The MST uses a multi-authority matching pipeline to ensure each species is uniquely identified, enabling accurate merging of models from different data providers.
4.2 ID Resolution Cascade
Species identifiers are resolved through a cascading lookup process:
- ITIS TSN match: if the source dataset provides an ITIS Taxonomic Serial Number, use the ITIS-to-WoRMS crosswalk for direct matching
- WoRMS crosswalk: look up accepted WoRMS AphiaID via the GBIF backbone, which integrates WoRMS as the marine taxonomy source
- Scientific name match: for records without matching identifiers, attempt exact scientific name matching against WoRMS accepted names
- API lookup: for remaining unresolved names, query the WoRMS REST API (
wm_records_name()) for fuzzy matching
At each stage, deprecated names are resolved to their accepted synonyms, and the taxonomicStatus field is used to determine the preferred name.
4.3 Species Categories
Valid species are classified into 7 categories based on their taxonomic position:
| Category | Description | Examples |
|---|---|---|
| bird | Seabirds and shorebirds | albatross, petrel, tern, pelican |
| coral | Reef-building and deep-sea corals | stony coral, soft coral, black coral |
| fish | Bony and cartilaginous fishes | grouper, shark, ray, tuna |
| invertebrate | Non-coral marine invertebrates | crab, lobster, sea urchin, squid |
| mammal | Marine mammals | whale, dolphin, seal, manatee |
| reptile | Sea turtles | loggerhead, green, leatherback |
| other | Uncategorized marine organisms | worms, tunicates, and bryozoans |
4.4 Data Quality
4.4.1 Duplicate Resolution
Multiple source datasets may provide models for the same species. The taxonomic matching step identifies duplicates by resolving all source names to canonical WoRMS or BirdLife identifiers. When a species appears in multiple datasets, the model merging pipeline (see Chapter 6) takes the MAX value across all sources rather than treating them as separate species.
4.4.2 Synonym Handling
Taxonomic names change over time as species are reclassified. The pipeline handles this by:
- resolving all names through
acceptedNameUsageIDin each authority - tracking the original source name alongside the accepted name
- flagging deprecated WoRMS IDs and updating them to current accepted IDs
4.4.3 Valid Species Filter (is_ok)
Not all taxa in the database are included in the final analysis. A species is flagged as valid (is_ok = TRUE) when it meets all of the following criteria:
- has a merged model (i.e., at least one source dataset provides cell values)
- is classified as marine (based on WoRMS
isMarineflag, or BirdLife seabird classification) - is not extinct (WoRMS
isExtinctor IUCN Red ListEXstatus) - has cells overlapping at least one BOEM Program Area
This filter yields 9,819 valid species from approximately 17,333 total taxa across all datasets.
4.5 Key Function
The taxonomic matching is implemented in msens::match_taxa(), which orchestrates the ID resolution cascade and returns a unified taxon table with cross-referenced identifiers from all authorities.