Skip to main content

Data Management for the Sciences

A guide to best practices for management of research data, including links to data services from the University of California.

Overview

In 2011, the National Science Foundation (NSF) began requiring that grant applicants include a data management plan in their proposals. The National Institutes of Health (NIH), National Endowment for the Humanities (NEH), the Gordon and Betty Moore Foundation and others have similar policies in effect.

By creating a data management plan, you will not only satisfy funding agencies but also have an opportunity to think through how to manage your data for your own use as well as any future use by fellow researchers.

Common Requirements for a DMP

Although funding institutions have different specific requirements, a data management plan should generally contain the following components:

  • Description of the project: e.g., purpose of the research, organization(s) and staff involved
  • Description of the data to be collected: e.g., the nature and format of the data, how it will be collected, and overview of secondary data available on the topic
  • Standards to be applied for formats, metadata, etc.
  • Plans for short-term storage and data management: e.g., file formats, local storage and back up procedures, and security
  • Description of legal and ethical issues: e.g., intellectual property, confidentiality of study participants
  • Access policies and provisions: i.e., how will you make data available to others, any restrictions to to data reuse, etc.
  • Provisions for long-term archiving and preservation: e.g., in a data archive
  • Assigned data management responsibilities: i.e., which persons will actually be responsible for ensuring data management; how will compliance with this plan be monitored and ensured over time

Source: from the University of New Hampsphire

Data Management Plan Tool

Logo: DMPTOOL Build your data management plan


DMPTool
The DMPTool provides a click-through wizard to guide researchers through the process of creating high quality data management plans that meet funder requirements. The DMPTool supports all major funders, and is updated when funders release new requirements.  

Examples of DMPs

To get an idea of what data management plans look like, check out these examples:

  • a plan produced by the DMP Tool
  • a sample plan produced by ICPSR
  • examples of data sharing plan produced by NIH (short)
  • two sample plans from the University of Wisconsin-Madison (1, 2)
  • examples shared by UC San Diego

Funding Agency Requirements

Each funding agency has its own set of guidelines. To see what these are for a few funding agencies, see the Funding Agency Requirements page.

UCLA Data Management Plan Template

1. Description

  • Give a summary of the data you will collect or create, noting the content, coverage and data type, e.g., tabular data, survey data, experimental measurements, models, software, audiovisual data, physical samples, etc. Indicate which data are of long-term value and should be shared and/or preserved.
  • If purchasing or reusing existing data, explain how issues such as copyright and intellectual property rights have been addressed. You should aim to minimize any restrictions on the reuse (and subsequent sharing) of third-party data.
  • Clearly note what format(s) your data will be in and explain why you have chosen certain formats. See UK Data Service guidance on recommended formats or DataONE Best Practices for file formats.
  • Note what volume of data you will create in MB/GB/TB. Indicate the proportions of raw data, processed data, and other secondary outputs. Consider whether the scale of the data will pose challenges when sharing or transferring data between sites; if so, how will you address these challenges?

     

2. Data Organization and Metadata

  • Outline how the data will be collected, processed, and organized. This should cover relevant standards or methods, quality assurance and data organization (e.g., naming conventions, version control and folder structures.).
  • Explain how the consistency and quality of data collection will be controlled and documented. This may include processes such as calibration, repeat samples or measurements, standardized data capture, data entry validation, peer review of data or representation with controlled vocabularies. See the DataOne Best Practices for data quality.
  • What metadata will be provided to help others identify and discover the data?
  • Consider what other documentation is needed to enable reuse. This may include information on the methodology used to collect the data, analytical and procedural information, definitions of variables, units of measurement, any assumptions made, the format and file type of the data and software used to collect and/or process the data.
  • Consider how you will capture this information and where it will be recorded, e.g., in a database with links to each item, in a ‘readme’ text file, in file headers, etc.

     

UCLA Guidance: Researchers are strongly encouraged to use community metadata standards where these are in place. The Research Data Alliance offers a Directory of Metadata Standards. Data repositories may also provide guidance about appropriate metadata standards. Also, see DRYAD's ReadMe guidance and University of Minnesota Library's readme template.

 

3. Ethics, Privacy, and Intellectual Property

  • Consider how you will protect the identity of participants, e.g., via anonymization or using managed access procedures.
  • Ethical issues may affect how you store and transfer data, who can see/use it and how long it is kept. You should demonstrate that you are aware of this and have planned accordingly
  • State who will own the copyright and IPR of any existing data as well as new data that you will generate. For multi-partner projects, IPR ownership should be covered in the consortium agreement.
  • Outline any restrictions needed on data sharing, e.g., to protect proprietary or patentable data.

 

UCLA Guidance: “Research Data are the property of The Regents of the University of California. The Principal Investigator shall retain Research Data on behalf of the University, in accordance with Section VI.b.” of Interim UCLA Guidance on Access to and Management of Research Data and Tangible Research Materials.

 

4. Storage and Security

  • Describe where the data will be stored and backed up during the course of research activities. This may vary if you are doing fieldwork or working across multiple sites so explain each procedure.
  • Identify who will be responsible for backup and how often this will be performed.
  • Consider data security, particularly if your data is sensitive. Note the main risks and how these will be managed. Identify any formal standards that you will comply with.

 

UCLA Guidance: Consider using UCLA Box and other Cloud Services at UCLA, IDRE Data Storage options, or UCLA’s version of the Open Science Framework.

 

5. Data Sharing and Preservation

  • How will you share the data e.g. deposit in a data repository, use a secure data service, handle data requests directly or use another mechanism? If you do not propose to use an established repository, demonstrate that the data can be curated effectively beyond the lifetime of the grant.
  • When will you make the data available? Research funders expect timely release. They typically allow embargoes but not prolonged exclusive use.
  • Who will be able to use your data? If you need to restrict access to certain communities or apply data sharing agreements, explain why.
  • How might your data be reused in other contexts? Where there is potential for reuse, you should use standards and formats that facilitate this, and ensure that appropriate metadata is available online so your data can be discovered. Persistent identifiers should be applied so people can reliably and efficiently find your data. They also help you to track citations and reuse.
  • If depositing in a data repository, it helps to show that you have consulted with the repository to understand their policies and procedures, including any metadata standards, and costs involved.
  • Outline the plans for data sharing and preservation - how long will the data be retained and where will it be archived? Will additional resources be needed to prepare data for deposit or meet any charges from data repositories?

 

UCLA Guidance: Consider using a disciplinary repository if available, see lists at re3data or PLOS ONE recommended repositories. If there is not an appropriate disciplinary repository consider UCLA’s DataDen or Social Science Data Archive. If you use a repository or server that does not provide persistent identifiers, use EZID to get one.

 

The Interim UCLA Guidance on Access to and Management of Research Data and Tangible Research Materials states in VI.b. “Principal Investigators shall retain all Research Data on behalf of the University in accordance with this Guidance for as long as possible, but not less than a minimum of six years after final reporting, publication, completion or abandonment of the project, unless a longer retention period is indicated by the funding source or other relevant agreement”

 

6. Roles and Responsibilities, Budget, and Related Policies

  • Outline the roles and responsibilities for all activities, individuals should be named where possible. For collaborative projects you should explain the coordination of data management responsibilities across partners.
  • Carefully consider and justify any resources needed to deliver the plan. These may include storage costs, hardware, staff time, costs of preparing data for deposit and repository charges.
  • Outline any relevant technical expertise, support and training that is likely to be required and how it will be acquired.
  • List any other relevant funder, institutional, departmental or group policies on data management, data sharing and data security.


UCLA Guidance: See DataONE Best Practices: Define roles and assign responsibilities for data management.