Data Quality Management

Beyond scientific quality assurance, we recommend basic quality reviews be performed to your data throughout its lifecycle, from collection through sharing final data files in the Research Workspace. The goal of data quality assessment and control is to ensure the generation of precise, accurate, and reproducible data.

We recommend the following best practices be followed for quality management:

Quality Assurance Before Data Collection

  • Define standards prior to data collection to ensure consistency.

    • Decide in what formats data will be stored.

    • Define the encoded and null data values that will be used.

    • Specify units of measure for parameters.

  • Assign specific quality assurance tasks to team members.

  • Document metadata in unison with data collection activities, including the specific quality management steps that were undertaken.

Quality Control and Quality Assurance During Data Entry

  • Perform data transcription error checks, such as:

    • Use automated methods to check for agreement.

    • Have multiple people independently enter the data.

  • Design an efficient storage structure for the data.

  • Document any modifications or processing to the dataset to be included in the metadata.

Quality Control After Data Entry

  • Ensure that data are delimited and line-up in proper columns.

  • Check that there are no missing values for key parameters.

  • Scan for anomalous values.

  • Perform and review any statistical summaries.

  • Map location data (e.g. geographic coordinates) and assess errors.

Quality Control After Loading to the Research Workspace

In the metadata describing your data, document the quality assurance and control steps taken to inform future users of the validity and accuracy of your sampling program, processing and analysis steps, and of your final data products. Specifically, the documentation should include:

  • The quality assurance and quality control that have been applied

  • A description of the quality level for the data

  • Known problems that limit the data’s use (e.g., uncertainty, sampling problems, blanks, QC samples)

  • Summary statistics generated directly from the final data file for use in verifying file transfer and transformations.