Best Practices for Metadata - Models/Computer Code¶

Types of models/computer code¶

Models, model-based projects, and model derived products should have documentation sufficient to allow someone with the appropriate skills and knowledge to generate comparable results using a similar method.

There are generally 3 types of model-based projects, and knowing which type to focus on provides a starting for the metadata.

Developing a standalone model	ReTooling or re-newing a previous model	Applying an existing model to different/new data
Example: Predicting Sockeye Salmon run timing for the Salmon River (Gulf of Alaska)	Example: Predicting Sockeye Salmon run timing in the Gulf of Alaska, in Python 3	Example: Predicting Coho Salmon run timing in the Gulf of Alaska based on an existing, geographically relevant model

Begin at the beginning¶

A clear understanding of the documentation needs from the start of a project, with agreement from major partners, will go a long way towards success. Examples of these needs include responsible data/output handling, process documentation, guidelines for quality control, and the metadata associated with each of these steps.

Assess what it is that represents the scholarly output for your project. For example, if the model could be written in Python, R, MatLab, and the idea is the same, the metadata should be about the methods. If the model is applying an idea to a specific type of data or using a specific code/computation approach, adjust the metadata accordingly.

Why document metadata?¶

Aim for reusability. It is important to keep a future re-user in mind, and the multitude of possible re-use causes. It is typically safe to assume a little familiarity, but not a full understanding. Re-use could range from pre-processing, to examining what was left out, to a helpful process for building a further product.

What to document?¶

Here is a table describing an overview of what to include, based on model types described here.

	Type 1 Standalone model	Type 2 Updated model	Type 3 Applied model
Project-level documentation	Yes	Yes	Yes
Input file(s)	Yes	Yes	Yes
Model code	Yes	Yes	Optional
Output file(s)	Optional	Optional	Yes

Other considerations while creating metadata:¶

Inputs should be documented, or their sources, but it is not typically necessary to include raw files

Parameters are typically a type of input file for your codeily included as a standalone file

Archive intermediary input files if they are original to the project - outputs from another model should not be included, but instead cited (see previous tip)

Make note of any original data processing technique, typically in the “Process steps” of a Methods section

For further questions, reach out to Axiom team members at metadata@axiomdatascience.com.