Data Management for the Sciences

A guide to best practices for management of research data, including links to data services from the University of California.

Data Sharing Overview

Sharing data is now required by major governmental and philanthropic funding agencies, and many journals require it as a prerequisite for publication. There are generally three approaches to sharing data. We will elaborate on each of these categories in the boxes below.

  1. Using a discipline-specific data repository. These repositories are often supported and managed by the communities that use the data. This improves discoverability of the data of interest, and ensures the metadata associated with the dataset is useful enough for reuse. Importantly, specialized repositories for protected data help with secure long-term storage.
  2. Using a general-purpose repository. These repositories are the most flexible for data storage and are easily citable in publications an grant applications.
  3. Using a public data repository. This is not generally recommended for depositing published research since they have looser standards for reuse, integrity and discoverability. However, public data sets are immensely useful for data discovery, research and education since they are often free to use and easy to save, locally manipulate and modify.

Finding Discipline Specific Repositories

If you are looking for data but not sure where to start for searching data in your discipline, there are a number of resources out there that can help narrow down your search. In this section we include directories of data repositories which can help you identify a place to store, or discover data relating to your discipline. These are excellent resources for finding discipline-specific repositories.

General Purpose Repositories

It is often useful to collect different types of data into one data record, especially if they are associated with specific publications, research grants, or research groups. General purpose repositories offer the most flexibility in terms of types of data that can be stored. They create permanent identifiers to ensure data is citable, and maintain metadata standards that connect data to their respective publications, grants or research groups.

Public Data Repositories

Public data sources are an extremely useful way of conducing community-oriented data research. They include datasets in a very large variety of data topics. While they have looser standards for preservation and archiving, they can be an excellent place to start for data discovery, education and research.

Why Deposit Data?

Many established repositories have built-in mechanisms to maintain data integrity and discoverability, making it easier for researchers to maintain data they might need to share with collaborators or colleagues. Most importantly, established data repositories are able to issue of permanent identifier links (like Digital Object Identifiers, or DOIs), commonly used for improving discoverability and citing journal articles. 

Repositories allow researchers to include metadata describing appropriate uses for the deposited data and include metrics that researchers can use to monitor reuse of the data. Repositories also minimize the burden on faculty and researchers for data sharing requests by allowing researchers to directly link deposited data with their publications, instead of relying on formal requests.

Finally, there are specialized data repositories built for protected data, allowing researchers to privately store and control access to sensitive data. This ensures data can be stored with the appropriate security protocols further minimizing the security burden on long-term storage of protected data.

Data Citation Index

The Data Citation Index on the Web of Science provides a single point of access to quality research data from repositories across disciplines and around the world. Read more information about the coverage and selection process of the data citation index here.