Describing Data - Research Data

Describing your data helps you and others to understand it in the future. Learn more about:

Using a metadata standard
Providing context in a supporting document, like a README file

New to metadata? Check out Metadata 101 from the Emory Libraries.

Metadata standards

Selecting a metadata standard or schema does not compel you to use it to its fullest extent. You can use as much (or as little) as you need. To search for more metadata standards by discipline, see the Metadata Directory from the Research Data Alliance.

General Purpose schemas

Dublin Core: a general standard, often adapted for specific disciplines. For example, the Dryad data repository for life sciences uses Dublin Core.
FGDC (Federal Geographic Data Committee): oversees geospatial standards for the U.S. Use their site to select a geospatial metadata standard or find available tools.

Science schemas

Darwin Core: adapted from Dublin Core, and used to describe biological specimens.
Ecological Metadata Language (EML): ecological data. Use the Morpho Data Management software to create and edit metadata using EML.
Integrated Taxonomic Information System (ITIS): taxonomy for "plants, animals, fungi, and microbes of North America and the world."
Unified Metadata Model (UMM): brings together multiple metadata standards in use for earth science data, with all records housed in NASA’s Common Metadata Repository.

Social Science schemas

Data Documentation Initiative (DDI): social and behavioral science data. Use Nesstar Publisher, a free XML editor for DDI.

Humanities schemas

Text Encoding Initiative (TEI): guidelines for encoding texts in digital formats using markup language. The TEI community contributes tools for authoring, editing, and publishing TEI documents.
VRA Core (Visual Resources Association): "data standard for the description of images and works of art and culture."

Documentation and README files

Documentation of your data should include information such as:

Title of dataset, investigator names, creation date, and keywords.
Purpose of study, research questions, and hypotheses.
Sampling techniques, methodology, and experimental protocols.
Equipment/instrument settings.
Description of independent and dependent variables.
Software syntax and code.
File formats, content, size, and relationship among files.
Data identifiers (DOI, URI).
Data source, provenance, and copyright permissions.
Associated presentations/manuscripts/articles.

This information can be included in the metadata that describes your files, or in supplemental files you maintain with your dataset, such as a codebook, data dictionary, or syntax files with code to replicate your process.

The Mozilla Science Lab's Planning for Data Reuse Checklist can help you outline the information to include with your data.

README files are another option to document your data. This typically takes the form of a plain text file that provides context for the data collection.

Cornell University's Research Data Management Service Group has a helpful Guide to writing “readme” style metadata to go with your data, including a sample README file template.