Skip to content

Data sources and handling

Required data

These data sources are required for the core functionality of MarINvaders. They are either accessed through API (OBIS, WoRMS) or the data is included in MarINvaders (NatCon, Ecoregions).

  • OBIS

    The Ocean Biodiversity Information System is a global open-access data and information clearing-house on marine biodiversity for science, conservation and sustainable development.

  • WoRMS

    The World Register of Marine Species provides an authoritative classification of marine species including AphiaIDs as unique identifiers.

  • NatCon

    This global database contains information on over 330 marine invasive species, including non-native distributions by marine ecoregion, invasion pathways, and ecological impact and other threat scores.

  • Marine Ecoregions of the World—MEOW

    Marine ecoregions are ecoregions (ecological regions) of the oceans and seas identified and defined based on biogeographic characteristics1.

Optional data

These are two data sources provided by the IUCN (International Union for Conservation of Nature). Due to the strict license of IUCN data we can not include these data in the MarINvaders package. See the iucn data section on how to get and use these data.

  • GISD

    The Global Invasive Species Database is a free, online searchable source of information about alien and invasive species that negatively impact biodiversity. This data source expands the information on the distribution of alien species.

  • IUCN Redlist

    The IUCN Red List data is used to assess which species are affected (Threat category 8.1 - "Invasive non-native/alien species/diseases") by marine invaders. Note, however, that these species might be threatened by other types of threats as well.

Data wrangling


After selecting an marine ecoregion, the OBIS API v3 is used to query all species for which there is occurrence data within that ecoregion in the OBIS database. Each species is then searched for in the other databases to potentially identify as alien.

The WoRMS REST webservice is used to find the establishmentMeans - whether the species is flagged as alien or not - whereas all species included in the GISD and NatCon databases as per definition alien. Optionally, the IUCN Redlist is used to specify species affected by invasives in the ecoregion.

Species names and alien status

For the data reconciliation across databases we established the following hierarchy for the alien status of a species: (1) WoRMS, (2) GISD species name (if available), (3) NatCon species name. This was necessary as not all species are covered in WoRMS. To identify a specific species across databases we searched each for all synonyms of a specific species. WoRMS species name was regarded authoritative, subspecies and species varieties were ignored and we only cover species with accepted status. Abbreviations between genus and species names were removed before reconciling the synonyms; this also affected the abbreviation cf. which is used for indicating difficulties regarding the right identification of a species. In the case that invasive species were covered in GISD and/or NatCon and not by the flagged WoRMS accounts, the species entries of GISD and NatCon were matched with the species accounts in the total WoRMS, i.e. with WoRMS AphiaIDs that are not flagged as invasive.

Geographic harmonization

The databases provide geographical distributions on different scales. The NatCon distributions are on a marine ecoregion level. Most of the WoRMS distributions are either IHO (International Hydrographic Organization) Sea Areas, Exclusive Economic Zones (EEZ), or an intersect of these, and have a Marine Regions Geographic Identifier (MRGID) which is matched to a marine ecoregion by the use of shapefiles. GISD does not provide such MRGID’s but instead gives only quantitative distributions, such as country names. Most of these could still be matched to existing shapefiles by comparing names, and subsequently be matched to marine ecoregions. All the distributions that could not be matched automatically were searched for manually and matched to one or more marine ecoregions. In detail, the point records, i.e. latitude/longitude coordinates, of all species with an AphiaID were retrieved from OBIS and allocated to marine ecoregion.

As for the point record allocation, species records retrieved from OBIS do not distinguish between native and alien ranges of species. Mapping the qualitatively given geographic distributions of alien species in WoRMS, GISD, and NatCon to marine ecoregions enables us to determine the alien ranges of marine species. The procedure first marks all ecoregions where a species is distributed (using the point records from OBIS). Of these ecoregions, those that are described by WoRMS, GISD, or NatCon as alien are marked as such. Here, a composite approach is followed where as soon as one database denotes an area as an alien range the whole respective ecoregion/s is/are marked as alien, being hence in accordance with the pre-cautionary principle. The qualitative range names from WoRMS, GISD, and NatCon are thereby translated into polygons. This is done by comparing these names with area names according to various geographical classifications, e.g. IHO sea areas. For these areas, polygons are available on; this data is included in MarINvaders. These polygons are then compared with the ecoregion polygons, the latter of which are checked as alien when the former overlap with them.


Generally, as soon as one database denotes an area as an alien range the whole respective ecoregion/s is/are marked as alien. For instance, GISD describes the Sea of Azov as the alien range of the invasive Mytilus galloprovincialis. This means that the whole Black Sea is then denoted as the alien range of this species, since the ecoregion that covers the Sea of Azov is “44. Black Sea”. If a source database denotes a whole country as the alien range of a species, more specific information of the other source databases is applied. If none of the other databases has accounts on the distribution of this species, however, the ecoregion/s containing records indicated by the first source database is/are marked as alien. The general procedure for the allocation of and distinction between native and alien ranges of a species is shown in Figure 1.

Overview pic Figure 1: Allocation of species records to ecoregion classification and distinction between native and alien range.

  1. OBIS delivers species records (blue dots)
  2. OBIS records superimposed on marine ecoregions (white boxes)
  3. All ecoregions containing species records are marked (green); while included species records are allocated to the native distribution, outlying species records (not covered by any marine ecoregion) are ignored; ecoregions not containing species records remain neutral, i.e. unrelated to a species distribution;
  4. A qualitative alien range description (red circle) leads to denoting an ecoregion and the included species records as alien, while the unaffected ecoregion are unaffeced; an ecoregion not containing species records, yet covered (partly) by the qualitative alien range description, remains neutral.

In case of conflicting geographic information within a single source database and when no clarifying information from another source database is available, the affected ecoregion is excluded and marked as such. For instance, when only one of the source databases denotes a certain area where a species is distributed as native and alien simultaneously without further geographic distinction, the respective ecoregions are flagged as conflicting – yet only when species records from OBIS are available for the respective areas. In cases when source databases denote areas, such as the alien or native range of a species, but the global distribution provided by OBIS for this species does not show any records there, the respective regions are neglected, i.e. only those regions are considered for which distribution records from OBIS exist.

This has also another practical reason: Regional accounts of species in some source databases are often given on country-level, yet many countries have multiple bordering marine ecoregions, partly in very different oceanographic sites, e.g. the West and East Coast of the USA or the surrounding waters of Australia; considering all these bordering ecoregions as potential alien ranges would give a too unclear picture of the situation and is thus disregarded. An exception exists when no source database describes an area as alien or native, but the ecoregion contains records from OBIS, and when, additionally, the name of the species indicates a certain origin, e.g. in the case of the Northern Pacific seastar: in this case, the respective ecoregions are also flagged as conflicting.

  1. This marine ecoregions classify coastal and shelf areas into a nested system of realms, provinces, and a total of 232 ecoregions, which were used in the present study. Hereby, a focus is put on coastal and shelf waters (covering both benthic and pelagic zones) as most marine biodiversity and, concurrently, most threats to it occur there see UNEP report on Marine and costal eco-systems. Global shipping being the main contributor to species introductions - with small to large scale ports in coastal areas - makes this classification particularly applicable. Oceanographic, topographic, geomorphological, and biogeographic features separate the marine ecoregion