Research Guides: Data Management for the Sciences: Data Description

Documentation

It is critical to begin to document your data at the very beginning of your research project, even before data collection begins; doing so will make data documentation easier and reduce the likelihood that you will forget aspects of your data later in the research project.

Following are some general guidelines for aspects of your project and data that you should document, regardless of your discipline. At minimum, store this documentation in a readme.txt file or the equivalent, together with the data. One can also reference a published article which may contain some of this information.

Title	Name of the dataset or research project that produced it
Creator	Names and addresses of the organization or people who created the data
Identifier	Number used to identify the data, even if it is just an internal project reference number
Subject	Keywords or phrases describing the subject or content of the data
Funders	Organizations or agencies who funded the research
Rights	Any known intellectual property rights held for the data
Access information	Where and how your data can be accessed by other researchers
Language	Language(s) of the intellectual content of the resource, when applicable
Dates	Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, e.g., maintenance cycle, update schedule
Location	Where the data relates to a physical location, record information about its spatial coverage
Methodology	How the data was generated, including equipment or software used, experimental protocol, other things one might include in a lab notebook
Data processing	Along the way, record any information on how the data has been altered or processed
Sources	Citations to material for data derived from other sources, including details of where the source data is held and how it was accessed
List of file names	List of all data files associated with the project, with their names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov')
File Formats	Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software required to read the data
File structure	Organization of the data file(s) and the layout of the variables, when applicable
Variable list	List of variables in the data files, when applicable
Code lists	Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data')
Versions	Date/time stamp for each file, and use a separate ID for each version (see file organization)
Checksums	To test if your file has changed over time (see data backup)

Source: from MIT Libraries

Also, see DRYAD's ReadMe guidance and University of Minnesota Library's readme template.

Metadata Standards

"Metadata" is data about data. It's structured information that describes content and makes it easier to find or use. A metadata record can be embedded in data or stored separately. Any data file in any format can have metadata fields. In social science, this record is called the "codebook" or "data dictionary."

There are many metadata standards and which one is right for your data will depend on the type, scale, and discipline of your research project. The UK's Digital Curation Centre has a list of metadata standards by discipline.

Some examples of metadata standards are:

If it turns out your field doesn't have a metadata standard or if you just need a simpler system to keep track of data within your own lab, consider the general guidelines in the "Documentation" box on the left side of this page.

Intellectual Property Considerations

Data cannot be copyrighted. However, a particular expression of data, such as a chart or table in a publication, can be copyrighted. See more information from the University of Michigan’s Copyright Office on Copyrightability of Charts, Tables, and Graphs.
Data can be licensed; licensing conditions can be imposed to protect participants’ privacy or limit further uses.
To promote sharing and unlimited use of data, make it available under a Creative Commons CC0 Declaration. For more info on data licenses see Open Data Commons.
Researchers may or may not have the right to share data collected from other sources, depending upon the sources' license terms.
Most licensed databases the UC Libraries subscribe to prohibit redistribution of data outside of UC. For more information on terms of use for databases licensed by the Libraries, contact the subject specialist supporting your discipline.

If you are uncertain about your rights to disseminate data you collected, consult with the UCLA Office of Intellectual Property and Industry Sponsored Research or the UCLA Office of Campus Counsel.