Best Practices for Metadata - Models/Computer Code

Types of models/computer code

Models, model-based projects, and model derived products should have documentation sufficient to allow someone with the appropriate skills and knowledge to generate comparable results using a similar method.

There are generally 3 types of model-based projects, and knowing which type to focus on provides a starting for the metadata.

Developing a standalone model

ReTooling or re-newing a previous model

Applying an existing model to different/new data

Example: Predicting Sockeye Salmon run timing for the Salmon River (Gulf of Alaska)

Example: Predicting Sockeye Salmon run timing in the Gulf of Alaska, in Python 3

Example: Predicting Coho Salmon run timing in the Gulf of Alaska based on an existing, geographically relevant model

Begin at the beginning

A clear understanding of the documentation needs from the start of a project, with agreement from major partners, will go a long way towards success. Examples of these needs include responsible data/output handling, process documentation, guidelines for quality control, and the metadata associated with each of these steps.

Assess what it is that represents the scholarly output for your project. For example, if the model could be written in Python, R, MatLab, and the idea is the same, the metadata should be about the methods. If the model is applying an idea to a specific type of data or using a specific code/computation approach, adjust the metadata accordingly.

Why document metadata?

Aim for reusability. It is important to keep a future re-user in mind, and the multitude of possible re-use causes. It is typically safe to assume a little familiarity, but not a full understanding. Re-use could range from pre-processing, to examining what was left out, to a helpful process for building a further product.

What to document?

Here is a table describing an overview of what to include, based on model types described here.

Type 1 Standalone model

Type 2 Updated model

Type 3 Applied model

Project-level documentation

Yes

Yes

Yes

Input file(s)

Yes

Yes

Yes

Model code

Yes

Yes

Optional

Output file(s)

Optional

Optional

Yes

Other considerations while creating metadata:

  • Inputs should be documented, or their sources, but it is not typically necessary to include raw files

  • Parameters are typically a type of input file for your codeily included as a standalone file

  • Archive intermediary input files if they are original to the project - outputs from another model should not be included, but instead cited (see previous tip)

  • Make note of any original data processing technique, typically in the “Process steps” of a Methods section

For further questions, reach out to Axiom team members at metadata@axiomdatascience.com.