Data from Sensors

Introduction

This section describes the process of submitting instrument-based sensor data to Axiom Data Science from an environmental sensor at a fixed or moving location.

Sensor data is sometimes called streaming data, but at Axiom we differentiate between the two. Sensor data is data that comes directly from an instrument (or was collected by a sensor directly). Sensor data may or may not be automatic. In contrast, streaming data is an instrument broadcasting its measurements for all to see and find.

This distinction is nuanced, and there is overlap. We welcome further discussion!

Terminology

A Station represents a physical platform that collects observations via one or more sensor packages. These sensors provide a stream of data for one or more variables. Data collected over time for one or more variables forms a dataset.

Note

Example: the Chukchi Ice Detection Buoy. This station is a mooring with multiple Sea-Bird SBE 37-SI sensors that collect data for temperature, conductivity, and pressure variables. This mooring is deployed annually, with slight tweaks to the sensor setup each time, so there is one dataset per year.

The CF Conventions we refer to are the Climate and Forecast Metadata Conventions that are community maintained and aim to improve data sharing.

General Guidelines

Use community standards. Follow CF Conventions for naming variables and structuring data. If possible, use a structured, self-describing format like NetCDF. For real-time data, use an existing, community-supported data server like ERDDAP.

Be consistent. Use the same file format and variable names across datasets, and from station to station. We use scripts to ingest data whenever possible, so any inconsistencies will require manual intervention and lead to delays.

Data Submission Guidelines

To host your data, we need two things from you: a station definition and your datasets. The station definition can be provided once and only changed as needed. For a historical data submission, you may upload your dataset using the form linked to at the bottom of this page. For continuous, real-time data submission, you should set up a data server so that we can pull the data from you on a regular basis.

Requirements to Submit

(AKA you will need:)

  • A dataset with header information, as clean and ‘tidy’ as possible

  • Metadata for the dataset, as in information about the scientific and technical details of the measurements

  • An access point for that data (API, THREDDS, manual transfer of files, etc) and any authentication Axiom will need to access it

  • Any pre-processing needs identified

  • Any quality control methods documented

  • A technical point of contact

  • A scientific point of contact

  • A list of the stations included in the dataset, in a CSV file with a header

  • A README TXT file with at least the file list, data accreditation, and data licensing detailed

More details about these requirements are below, followed at the end with an all-in-one checklist.

Dataset Details

Please provide your dataset in one of the following file formats, in order of preference:

  • NetCDF

  • CSV or TSV

NetCDF Guidelines

Pre-processing Needs

Sometimes, in-situ sensor data is not ready for data manipulation or readily available to standard tools. Some examples of pre-processing we have seen include loading data into the ERRDDAP servers or calibration adjustments to the instruments. These are best done by the scientific team responsible for the sensor.

CSV Guidelines

  • First column: variable name

  • Second column: units

  • Third column: time format

  • Use ISO 8601 format (e.g., 2018-01-01T00:00:13Z)

  • Provide time in UTC

  • QC format

    • use variable name + “_qc” (e.g., air_temperature and air_temperature_qc) for header value

    • use QARTOD values (2=not eval, 1=pass, etc)

    • more about QARTOD here

If the CSV headers and variables are not already set to CF Conventions, provide a seperate document with a mapping of the columns to them. For example: Variable _temp_ may refer to _sea_surface_temperature_.

Note

TIMESTAMPS Unfortunately, some instruments are local time only. If the date and timestamp data cannot be provide according the ISO 8601 format, please explain the context and provide the following details in the dataset’s README file: format details, such as YYYMM; what is the timezone used; whether daylight savings time is observed; and if a calendar isused, which one.

Providing Access to the Data

For continuous, real-time data submission, you should set up a data server so that we can pull the data from you on a regular basis. Please use one of the following server options, in order of preference:

  • ERDDAP

  • REST API

  • THREDDS

  • Public files on HTTP web server (apache, nginx, etc)

  • FTP server

Note

ERDDAP is preferred because it guarantees a consistent dataset structure, allows us to pull data in a format of our choosing, and we can re-use data ingestion scripts across multiple data sources. A REST API also guarantees structured data, but we have to write new scripts for each API we interact with.

Points of Contact

Please consider providing multiple points of contact in the metadata. Preferably both a science team member and an information technology (IT) team member. These people will be crucial should the Axiom team need to interpret data or troubleshoot getting access to the data.

Metadata

Include as much descriptive information about datasets, sensors, platforms, models, analysis methods, and quality-control procedures as possible. Metadata is essential for the long-term usability and reuse of information.

In general, metadata should follow one of the following standards:

  • ISO 19115 for dataset and collection-level metadata

  • ISO 19115-2 XML Metadata: Metadata: Part 2: Extensions for Imagery and Gridded Data

  • IOOS Metadata Profile for NetCDF

  • NetCDF-CF 1.6:Climate and Forecast conventions for NetCDF

  • ACDD 1.3: Attribute Conventions for Data Discovery

Data providers may request access to the Research Workspace to create ISO 19115 compliant metadata using the metadata editor and its associated help documentation.

Station List

For each station, please provide as a separate CSV file:

  • Station name

  • Location (lat, lng, depth/elevation)

  • Platform type (fixed, buoy, etc)

  • Expected data date range (so that we can double-check we have everything) as start date and end date

  • Instrument and data affiliations/attributions: * at minimum, provide the primary institution affiliated with this data * if possible, provide any other affiliations, such as the operator or funder

  • Links to web pages or documents with more background information, if available

  • If there have been other deployments, please let us know so we can keep notes for our reference, even if you are not providing data for those deployments

Some of this information may be included in the README as well.

README.txt

READMES are valuable narrative information for human-readable details about the data set. Please include the following details:

  • The file list

  • Data accreditation

  • Data licensing detailed

QC Information

Please include the following information in as much detail as possible:

  • Have you performed QC checks on this data? If so, are the results of these tests available in the dataset? Do you have any links to documentation about your qc process?

  • How are the QC variables defined in your datasets?

  • How can we access your datasets?

  • For real-time data: where and how is this hosted? How often is the dataset updated? Please provide links and any instructions for accessing the data

  • Is this dataset already hosted somewhere else? If so, where?

Submitting Data

After gathering the information described above and preparing your dataset, you may submit your files to Axiom using the following form, or upload your files to the Research Workspace if you already have an account.

Sensor Data Ingest Request Form