Data from Sensors¶
This section describes the process of submitting instrument-based sensor data to Axiom Data Science from an environmental sensor at a fixed or moving location.
Sensor data is sometimes called streaming data, but at Axiom we differentiate between the two. Sensor data is data that comes directly from an instrument (or was collected by a sensor directly). Sensor data may or may not be automatic. In contrast, streaming data is an instrument broadcasting its measurements for all to see and find.
This distinction is nuanced, and there is overlap. We welcome further discussion!
A Station represents a physical platform that collects observations via one or more sensor packages. These sensors provide a stream of data for one or more variables. Data collected over time for one or more variables forms a dataset.
Example: the Chukchi Ice Detection Buoy. This station is a mooring with multiple Sea-Bird SBE 37-SI sensors that collect data for temperature, conductivity, and pressure variables. This mooring is deployed annually, with slight tweaks to the sensor setup each time, so there is one dataset per year.
The CF Conventions we refer to are the Climate and Forecast Metadata Conventions that are community maintained and aim to improve data sharing.
Use community standards. Follow CF Conventions for naming variables and structuring data. If possible, use a structured, self-describing format like NetCDF. For real-time data, use an existing, community-supported data server like ERDDAP.
Be consistent. Use the same file format and variable names across datasets, and from station to station. We use scripts to ingest data whenever possible, so any inconsistencies will require manual intervention and lead to delays.
Data Submission Guidelines¶
To host your data, we need two things from you: a station definition and your datasets. The station definition can be provided once and only changed as needed. For a historical data submission, you may upload your dataset using the form linked to at the bottom of this page. For continuous, real-time data submission, you should set up a data server so that we can pull the data from you on a regular basis.
Requirements to Submit¶
(AKA you will need:)
A dataset with header information, as clean and ‘tidy’ as possible
Metadata for the dataset, as in information about the scientific and technical details of the measurements
An access point for that data (API, THREDDS, manual transfer of files, etc) and any authentication Axiom will need to access it
Any pre-processing needs identified
Any quality control methods documented
A technical point of contact
A scientific point of contact
A list of the stations included in the dataset, in a CSV file with a header
A README TXT file with at least the file list, data accreditation, and data licensing detailed
More details about these requirements are below, followed at the end with an all-in-one checklist.
Please provide your dataset in one of the following file formats, in order of preference:
CSV or TSV
Follow ACDD 1.3
Conduct quality control per scientific practice. Refer to recommended best practices
Use a compliance checker and follow its feedback. The CF compliance checker is available here
Sometimes, in-situ sensor data is not ready for data manipulation or readily available to standard tools. Some examples of pre-processing we have seen include loading data into the ERRDDAP servers or calibration adjustments to the instruments. These are best done by the scientific team responsible for the sensor.
First column: variable name
Second column: units
Third column: time format
Use ISO 8601 format (e.g., 2018-01-01T00:00:13Z)
Provide time in UTC
use variable name + “_qc” (e.g., air_temperature and air_temperature_qc) for header value
use QARTOD values (2=not eval, 1=pass, etc)
more about QARTOD here
If the CSV headers and variables are not already set to CF Conventions, provide a seperate document with a mapping of the columns to them. For example: Variable _temp_ may refer to _sea_surface_temperature_.
TIMESTAMPS Unfortunately, some instruments are local time only. If the date and timestamp data cannot be provide according the ISO 8601 format, please explain the context and provide the following details in the dataset’s README file: format details, such as YYYMM; what is the timezone used; whether daylight savings time is observed; and if a calendar isused, which one.
Providing Access to the Data¶
For continuous, real-time data submission, you should set up a data server so that we can pull the data from you on a regular basis. Please use one of the following server options, in order of preference:
Public files on HTTP web server (apache, nginx, etc)
ERDDAP is preferred because it guarantees a consistent dataset structure, allows us to pull data in a format of our choosing, and we can re-use data ingestion scripts across multiple data sources. A REST API also guarantees structured data, but we have to write new scripts for each API we interact with.
Points of Contact¶
Please consider providing multiple points of contact in the metadata. Preferably both a science team member and an information technology (IT) team member. These people will be crucial should the Axiom team need to interpret data or troubleshoot getting access to the data.
Include as much descriptive information about datasets, sensors, platforms, models, analysis methods, and quality-control procedures as possible. Metadata is essential for the long-term usability and reuse of information.
In general, metadata should follow one of the following standards:
ISO 19115 for dataset and collection-level metadata
ISO 19115-2 XML Metadata: Metadata: Part 2: Extensions for Imagery and Gridded Data
IOOS Metadata Profile for NetCDF
NetCDF-CF 1.6:Climate and Forecast conventions for NetCDF
ACDD 1.3: Attribute Conventions for Data Discovery
Data providers may request access to the Research Workspace to create ISO 19115 compliant metadata using the metadata editor and its associated help documentation.
For each station, please provide as a separate CSV file:
Location (lat, lng, depth/elevation)
Platform type (fixed, buoy, etc)
Expected data date range (so that we can double-check we have everything) as start date and end date
Instrument and data affiliations/attributions: * at minimum, provide the primary institution affiliated with this data * if possible, provide any other affiliations, such as the operator or funder
Links to web pages or documents with more background information, if available
If there have been other deployments, please let us know so we can keep notes for our reference, even if you are not providing data for those deployments
Some of this information may be included in the README as well.
READMES are valuable narrative information for human-readable details about the data set. Please include the following details:
The file list
Data licensing detailed
Please include the following information in as much detail as possible:
Have you performed QC checks on this data? If so, are the results of these tests available in the dataset? Do you have any links to documentation about your qc process?
How are the QC variables defined in your datasets?
How can we access your datasets?
For real-time data: where and how is this hosted? How often is the dataset updated? Please provide links and any instructions for accessing the data
Is this dataset already hosted somewhere else? If so, where?
After gathering the information described above and preparing your dataset, you may submit your files to Axiom using the following form, or upload your files to the Research Workspace if you already have an account.