Biodiversity Data

Introduction

This section describes the process of submitting biodiversity data to a data portal. The term “biodiversity” is used to describe the abundance and variety of life on Earth. Biodiversity data typically describes observations of the occurrence and abundance of different organisms in space and time.

Terminology

In this context, the term “occurrence” indicates observations of the presence or absence of organisms of a particular species at a given place and time. The term “abundance” refers to the count of organisms observed during that occurrence.

Formatting Data

Axiom Data Science currently works with biodiversity data in multiple formats. The preferred format, however, is data in compliance with Darwin Core standards. Find out more about Darwin Core on the working group’s website.

This standard will require certain columns, headers, and labels. It will also require the use of controlled vocabularies. The data must be ‘tidy’ in the sense that every column is a single variable, every row a single observation of a single measurement, and every cell holds only one value.

The best way to both format the data and set up the best submission pathway for data to flow to Axiom is to use the Ocean Biodiversity Information System. More on this below.

Submitting Data

Axiom’s preferred method of accessing data is through the community recognized system, Ocean Biodiversity Information System (OBIS). OBIS is a global open-access data and information clearing-house on marine biodiversity for science, conservation and sustainable development.

These guidelines reflect that preference, but are applicable to any kind of biodiversity data, with notes on methods other than OBIS where appropriate.

Requirements to Submit

(AKA you will need)

  • A dataset with header information, as clean and ‘tidy’ as possible

  • Metadata for the dataset, as in information about the scientific and technical details of the measurements
    • This metadata must with Darwin Core standards

  • An access point for that data, preferaby an OBIS perma-link (more below)

  • Any pre-processing needs identified

  • Any quality control methods documented

  • A technical point of contact

  • A scientific point of contact

  • A README TXT file with at least the file list, data accreditation, and data licensing detailed

How to do Darwin Core style metadata

There are many resources for describing data with the Darwin Core metadata standard. We will not repeat them here, but instead will provide an overview of what that work looks like broadly using an example of a spreadsheet that was used to monitor manatee sightings in the Atlantic Ocean.

The steps to make this (fictional) manatee data compliant with OBIS style Darwin Core would be as follows:

  1. Wrangle the data so that every row is a single observation. I will ultimately need 3 separate tables.

  2. Match the column headers of my manatee spreadsheet to the appropriate Darwin Core terms.

  3. Ensure I have the unique identifiers in place for my data. This includes Dataset ID, Occurrence ID and Event ID.

  4. Check that I have provided information in the 3 table format OBIS requires

Note

OBIS Nodes and submitting data Axiom is not an OBIS node, so please coordinate with the appropriate OBIS node for your data. If the data is based in the United States, there is a good chance the best OBIS Node for you is the NOAA Marine Life Program, or the United States Geological Survey.

  1. Publish the data to OBIS through the IPT and provide the final OBIS URL to Axiom.

Resources

Providing Access to the Data

The data should be submitted to OBIS, and the dataset URL shared with Axiom. For example: https://ipt-obis.gbif.us/resource?r=ambon_seabirds_2017

If connection via OBIS is not possible, other methods of data access, in order of preference, would be as follows:

  • ERDDAP

  • REST API

  • THREDDS

  • Public files on HTTP web server (apache, nginx, etc)

  • FTP server

If sharing files, conforming to OBIS methods is the preferred formatting. Otherwise, we understand that these other methods have formatting requirements, and if using ERDDAP or THREDDS we expect the data to comply with their infrastructure needs.

READMES are valuable narrative information for human-readable details about the data set. Please include the following details:

  • The file list

  • Data accreditation

  • Data licensing detailed

  • Any funder attributions that should be listed alongside the data

  • Contact information for the data set’s lead scientists or data steward