.. _data-management-cheat-sheet: *************************** Data Management Cheat Sheet *************************** Managing data responsibly isn't easy, even for simple scientific projects. For large projects and campaigns, it quickly begins to feel overhwelming. Good data management requires planning, communication, and will power---but the rewards are well worth the effort. This page is designed to provide a manageable amount of information to get you started with data management, and to serve as a handy reference for things like file formatting and naming conventions. For more detailed information, please consult our full :ref:`Data Management Best Practices ` page. And if you read through the information below and still find yourself needing help, please email us at metadata@axiomdatascience.com. Data Organization ================= * Make a :ref:`data management plan ` *before* you collect your data, including specifics on how your data will be processed, organized, and archived. * Come up with :ref:`logical naming conventions ` for folders, labels, and files, and follow those conventions throughout your project. * Establish a :ref:`heirarchical structure for your data files ` and avoid nesting more than three layers of subfolders. Data and File Formatting ======================== * Use open, non-proprietary, text-based :ref:`file formats ` whenever possible. * Decide on :ref:`logical file-naming conventions ` and stick to them. * Follow established conventions (e.g., `CF Conventions `_) for :ref:`data headers and variables ` whenever possible. * Decide on conventions for :ref:`coded and null values ` in your dataset and stick with them. * For biological data, include the `ITIS `_ taxonomic serial number (TSN) and the `WoRMS `_ AphiaID. Common File Format Specs ------------------------ The table below outlines specifications for some common data file formats. For all formats, follow `CF Conventions `_ for naming whenever possible. If the CF Conventions don't cover a name used in your project, refer to the `Marine Metadata Interoperability Ontology Registry and Repository `_. .. csv-table:: :widths: 250,750 :header: "File Format", "Specifications" " `NetCDF `_", " * Use the `NODC templates `_ and global attributes. * Use a `compliance checker `_ and follow its feedback. * Use the value `-9999` for null values." "`CSV `_", " * Follow `CF Conventions `_ for column names. * Double-check that decimal values are displayed to the correct number of significant digits. * Make sure your columns have consistent data types." "`Shapefiles `_", " * Be sure the appropriate projection is documented. * For vector data, include the coordinate reference information." "Databases", " * Most database formats will need to be converted to plain text for archiving (e.g. each table as a CSV file). * Include plain-text documentation of relevant table and field properties. * Capture table relationships in a diagram, which can be saved as a JPG file." "Spatial Media", " * For GPS-enabled video, include a table that connects latitude and longitude to video timestamp. * Verify that the timestamps in the table match the video timestamp for the full length of the video. * Include documentation of the video file format and resolution. * Include any still images during video in a row of the table, with corresponding latitude and longitude." "Sensor Data", " * Coming soon!" Data Quality Management ======================= * Assign specific :ref:`quality assurance ` tasks to specific people involved in your project. * Define parameter names, units, and null value codes before collecting data. * :ref:`Review all data ` for missing, anamolous, or invalid values immediately after collection. Metadata and Documentation ========================== * :ref:`Document ` how your data are collected, processed, and preserved at each stage of your project. * Budget time to prepare your data for :ref:`long-term preservation ` once your data are finalized.