Managing your data before you begin your research and throughout its life cycle is essential to ensure its current usability and long-run preservation and access. To do so, begin to think through the following questions.
Source: from MIT Libraries
Now that you've thought through some of the issues associated with proper management of your research data, read the pages within this guide to gather information on how to deal with these issues. For instance,
Gather a clear picture of what your data will look like. Is it, for example, numerical data, image data, text sequences or modeling data? Knowing exactly what kind of data you have will inform many decisions you need to make about storage, backups and more. Image data requires a lot of storage space, so you'll want to decide which of your images, if not all, you want to retain, and where such large datasets can be housed. As for backing up your data, your research center or university may have the ability to help you. On the other hand, if you are storing images, you may quickly exceed your institution's limit for backing up individual laboratories or groups.
Look through our Data Description and Organization for some tips and advice on describing and organizing various formats of data. Peruse the Data Backup and Security page to get some rules of thumb on storing and securing your data. Finally check out the Data Deposit and Sharing for links to repositories where you can make your data public while retaining accesss to it over the long run.
Once you know what kind of data you're producing, you'll be able to assess the growth rate. For example, are you gathering data by hand or using sophisticated instrumentation that is able to capture a lot of data at once? Will there be more data as time goes on? If so, you will need to plan for the growth. What amounts to enough storage this year may not be sufficient for next year.
Many of the respositories listed on the Data Deposit and Sharing can accomodate large datasets. However, many repositories accept only data that have been finalized, i.e. if you're looking to add to/modify a deposited dataset on a regular basis, you will have to look very carefully to see if your chosen repository allows such updates. Most funding agencies also only require the deposit of the final version of yoru data, i.e. the data that are associated with particular publications. Take a look at the Funding Agency Requirements page for more information.
The answer to this question impacts how you organize the data as well as the level of versioning you will need to undertake. Keeping track of rapidly changing datasets can be a challenge, so it's imperative you begin with a plan that will carry you through the data management process.
Frequently changing data present many of the same challenges as rapidly growing data. Check out the links to the pages mentioned in the previous bullet point as well as the Data Organization sheet.
Many government and private funding agencies require that you share the data generated during your research. See a look at the Funding Agency Requirements page for more information.
Even funding agencies that don't mandate data sharing may require that you have and submit a data management plan as part of your grant application. Learn how to craft such a document in the most time-efficient manner by using the tools featured on our Creating a Data Management Plan page.
Take a look at our Data Organization page for guidelines on what constitutes good documentation and naming conventions. See our Data Deposit and Sharing for tools that help you to assign persistent identifiers to your digital objects, including data.
Again take a look at our Data Deposit and Sharing but note that data shared must first be properly described if they are to have any chance at beign discovered and used. For standards that govern metadata, or data that describe the underlying content, see the Data Description page.
Source: adapted from MIT Libraries.