Skip to main content

Data Management for the Sciences

A guide to best practices for management of research data, including links to data services from the University of California.

Data Management Checklist

Managing your data before you begin your research and throughout its life cycle is essential to ensure its current usability and long-run preservation and access. To do so, begin to think through the following questions.

  1. What type of data will be produced? Will it be reproducible? What would happen if it got lost or became unusable later?
  2. How much data will it be, and at what growth rate? How often will it change?
  3. Who will use it now, and later?
  4. Who controls it (PI, student, lab, UCLA, funder)?
  5. How long should it be retained? e.g. 3-5 years, 10-20 years, permanently
  6. Are there tools or software needed to create/process/visualize the data?
  7. Any special privacy or security requirements? e.g., personal data, high-security data
  8. Any sharing requirements? e.g., funder data sharing policy
  9. Any other funder requirements? e.g., data management plan in proposal
  10. Is there good project and data documentation?
  11. What directory and file naming convention will be used?
  12. What project and data identifiers will be assigned?
  13. What file formats? Are they long-lived?
  14. Storage and backup strategy?
  15. When will I publish it and where?
  16. Is there an ontology or other community standard for data sharing/integration?
  17. Who in the research group will be responsible for data management?

Source: from MIT Libraries

Next Steps

Now that you've thought through some of the issues associated with proper management of your research data, read the pages within this guide to gather information on how to deal with these issues. For instance,

  • What type of data will be produced? Will it be reproducible? What would happen if it got lost or became unusable later?

Gather a clear picture of what your data will look like. Is it, for example, numerical data, image data, text sequences or modeling data? Knowing exactly what kind of data you have will inform many decisions you need to make about storage, backups and more. Image data requires a lot of storage space, so you'll want to decide which of your images, if not all, you want to retain, and where such large datasets can be housed. As for backing up your data, your research center or university may have the ability to help you. On the other hand, if you are storing images, you may quickly exceed your institution's limit for backing up individual laboratories or groups.

Look through our Data Description and Organization for some tips and advice on describing and organizing various formats of data. Peruse the Data Backup and Security page to get some rules of thumb on storing and securing your data. Finally check out the Data Deposit and Sharing for links to repositories where you can make your data public while retaining accesss to it over the long run.

  • How much of data, and at what growth rate?

Once you know what kind of data you're producing, you'll be able to assess the growth rate. For example, are you gathering data by hand or using sophisticated instrumentation that is able to capture a lot of data at once? Will there be more data as time goes on? If so, you will need to plan for the growth. What amounts to enough storage this year may not be sufficient for next year.

Many of the respositories listed on the Data Deposit and Sharing can accomodate large datasets. However, many repositories accept only data that have been finalized, i.e. if you're looking to add to/modify a deposited dataset on a regular basis, you will have to look very carefully to see if your chosen repository allows such updates. Most funding agencies also only require the deposit of the final version of yoru data, i.e. the data that are associated with particular publications. Take a look at the Funding Agency Requirements page for more information.

  • Will the data change frequently?

The answer to this question impacts how you organize the data as well as the level of versioning you will need to undertake. Keeping track of rapidly changing datasets can be a challenge, so it's imperative you begin with a plan that will carry you through the data management process.

Frequently changing data present many of the same challenges as rapidly growing data. Check out the links to the pages mentioned in the previous bullet point as well as the Data Organization sheet.

  • Any sharing requirements? e.g., funder data sharing policy

Many government and private funding agencies require that you share the data generated during your research. See a look at the Funding Agency Requirements page for more information.

  • Any other funder requirements? e.g., data management plan in proposal

Even funding agencies that don't mandate data sharing may require that you have and submit a data management plan as part of your grant application. Learn how to craft such a document in the most time-efficient manner by using the tools featured on our Creating a Data Management Plan page.

  • Is there good data documentation? What directory and file naming convention will be used? What project and data identifiers will be assigned?

Take a look at our Data Organization page for guidelines on what constitutes good documentation and naming conventions. See our Data Deposit and Sharing for tools that help you to assign persistent identifiers to your digital objects, including data.

  • Is there an ontology or other community standard for data sharing?

Again take a look at our Data Deposit and Sharing but note that data shared must first be properly described if they are to have any chance at beign discovered and used. For standards that govern metadata, or data that describe the underlying content, see the Data Description page.

Source: adapted from MIT Libraries.