Create SDM Tables

storing species distribution model outputs in a database for analysis

1 Species Distribution Models

The distribution of models varies geographically by extent, spatial analytical units, taxa, time period and output variable. To store the original data, we ingest data into the following database public schema:

erDiagram
%% https://mermaid.js.org/syntax/entityRelationshipDiagram.html

%% tables
sdm_datasets {
  str ds_key        PK   "dataset key"
  str name
  str link        
  str citation      
  str response_type
}
sdm_models {
  str ds_key        PK   "dataset key"
  int mdl_id        PK   "model ID"
  str description        "description of model"
  str sp_key        FK   "species key"
  str population         "subpopulation"
  str time_period        "start/duration (ISO 8601)"
  str var                "variable, e.g.: cv, n, se"
}
sdm_species {
  str ds_key         PK  "dataset key"
  str sp_key         PK  "species key"
  int aphia_id       FK  "singular taxa ID"
  arr aphia_ids      FK  "array of all taxa IDs"
  str sp_scientific
  str sp_common
  str description }
sdm_geometries {
  str  ds_key        PK  "dataset key"
  int  geom_id       PK  "geometry ID"
  geo  geom              "geometry"}
sdm_values {
  str ds_key         PK  "dataset key"
  int mdl_id         PK  "model ID"
  int geom_id        PK  "geometry ID"
  dbl val                "value"}
taxa {
  int aphia_id       PK  "unique ID at MarineSpecies.org"
  str scientific_name
  str common_name
  str rank
  str iucn_cat           "IUCN Red List extinction risk category"
}

%% relationships
sdm_datasets   ||--|{ sdm_models     : ds_key
sdm_datasets   ||--|{ sdm_species    : ds_key
sdm_datasets   ||--|{ sdm_geometries : ds_key
sdm_models     ||--|{ sdm_species    : "ds_key,sp_key"
sdm_geometries ||--|{ sdm_values     : "ds_key,geom_id"
sdm_models     ||--|{ sdm_values     : "ds_key,mdl_id"
taxa           ||--|{ sdm_species    : aphia_id
taxa           }o--|{ sdm_species    : aphia_ids

Specific columns:

  • sdm_datasets:
    • name_short
      short name in form of {source_broad} {regions} {taxa_groups} {response_type}
    • name_long
      original name
    • response_type one of: occurrence, range, suitability, probability or density
    • taxa_groups
      one or more of: fish, invertebrates, marine mammals, cetaceans, sea turtles, seabirds, etc.
  • sdm_models`
    • time_period
      time interval in format of {start}/{duration} per ISO 8601, e.g. “2019-02/P1M” is a duration of one month starting February, 2019. See https://en.wikipedia.org/wiki/ISO_8601#Time_intervals.

1.1 Data Dictionary

Data types: - str
string: variable-length character string, up to 1 GB; - ser
serial: 1 to 2,147,483,647; - bser
bigserial: 1 to 9,223,372,036,854,775,807; - int
integer: -2,147,483,648 to 2,147,483,647; - dbl
double precision: 15 decimal digits of precision;

2 Database Table and Column Naming Conventions

  • Table names are plural and use all lower case.
  • Unique identifiers are suffixed with:
    • *_id for unique integer keys;
    • *_key for unique string keys;
    • *_seq for auto-incrementing sequence integer keys.
  • Column names are singular and use snake_case.
  • Foreign keys are named with the singular form of the table they reference, followed by _id.
  • Primary keys are named id.