Data citation is an important component of data sharing and data reuse. Citing data gives data creators credit for creating and sharing their work, and creates a trail of research progress similar to the citation of articles and books.
Basic Data Citation
There's good consensus around the minimal components of a data citation:
Creator (Year) Title. Publisher. Identifier
- Creator(s): Individual(s) or organization responsible for creating the dataset.
- Year: Year the dataset was published, not necessarily created.
- Title: Should be as descriptive as possible.
- Publisher: Organization that provides access to the dataset (e.g. Dryad, Zenodo).
- Identifier: Persistent, unique identifier (e.g. a DOI).
- Location / Availability: The web address of the dataset is essential when the identifier can’t be used to reach the dataset.
- Version / Edition: Version of the dataset used in the present publication. Needed to reproduce analysis of versioned dynamic datasets.
- Access Date: Date of access for analysis in the present publication. Needed to reproduce analysis of continuously updated dynamic datasets.
- Format / Material Designator: e.g. database, CD-ROM.
- Feature Name: A description of the subset of the dataset used. May be a formal title or a list of variables (e.g. concentration, optical density).
- Verifier: Used to confirm that two datasets are identical. Most commonly a UNF or MD5 checksum.
- Series: Used if the dataset is part of series of releases (e.g. monthly, yearly).
- Contributor: e.g. editor, compiler
For datasets that have DOIs, DataCite and CrossRef provide a citation formatter to generate a citation matching any of a wide array of journal styles.
To learn more, see this DataPub blog post on Data Citation or the Joint Declaration of Data Citation Principles.