To determine what data you need, you must must first define the area of interest for your research question. Here are some questions to consider:
Questions adapted from Partlo, Kristin. 2009. "The Pedagogical Data Reference Interview." IASSIST Quarterly 33, (4): 6-10. Available at: https://iassistdata.org/sites/default/files/iqvol334_341partlo.pdf. Accessed via Staff and Faculty Work. Library. Carleton Digital Commons https://digitalcommons.carleton.edu/libr_staff_faculty/5
Next:
Data repositories contain published datasets that are typically associated with publications or ongoing research projects. Data repositories are used to store and preserve data so that researchers can access and analyze it.
There are two types of repositories we will discuss: scholarly and public.
Scholarly: Scholarly data repositories are managed by organizations or scientific societies and often have stricter guidelines on the format and level of detail in submissions. They generally are well-maintained, containing data sets from well-controlled studies, and that include detailed descriptions and metadata. Access to these repositories may be restricted.
When searching for data in scholarly repositories, be sure to check to see if it is associated with a research publication. Make sure there is enough information in the record to be sure you can reuse the data correctly.
Here are some scholarly repositories:
And here are a list of public repositories:
When collecting data, there are two main considerations.
First, research data sometimes has to be purchased and/or used under strict terms of agreement, or following specific privacy protocols. When purchasing data sets, or downloading protected data, be sure the data is stored in a safe and secure environment. It's important to respect copyright permissions and understand what constitutes fair use.
Second, carefully looking into the context and content of the data can help you understand any potential biases or limitations to prevent misuse. Data that is published is often associated with specific experimental strategies. While the strategies and limitations are often discussed when data are published with research papers, this information is not always available with the data set. When selecting a data set for use:
If your research question cannot be answered with existing datasets, it may be necessary to create your own. Creating data can be done through practices such as observation, surveying, simulation, and experimentation, as well as through methods that extract data from existing bodies of information such as web-scraping or text & data mining (TDM).
Data collection looks different for different disciplines. Here we include some generalized resources to assist with creating datasets:
When collecting original data to answer research questions, there are a few key things to think about in order to be sure the findings are accurate and can be used to draw conclusions relating to your research question.
Take detailed notes about the methods you are using to collect data. If data collection does not go as planned, make sure you make note of which aspects of the methods were changed.
If working with human-subjects data, or protected data, be sure to check with your local Institutional Review Board (IRB) office to see what kinds of protections need to be put in place for storing and reporting your data.
It is important to collect samples that properly represent your subject of study. If you expect to see certain results, compare your experimental sample with a sample with known results to see if the result is aligned with your expectations.
A sample with known results is called a control sample. A positive control is a sample where you expect to see the effect you think you will observe as a result of your experiment. A negative control is a sample where you know the result of your experiment will fail.
If you plan to collect data, be sure to outline how you will organize and analyze the data beforehand. If the work is to be published, have a plan to share the data. See more about data management below.
Data management plans (DMPs) are formal plans that describe the data you expect to acquire or generate through your research, along with how you plan to manage it, analyze it, and share it. Here are some resources focused on data management planning:
Data Management Plans (DMPs) are crucial to reproducible research practice because they provide a framework for how research data are managed and stored. This prevents data rot, the loss and/or corruption of data stored on individual computer hard drives. It also makes it easier to find data after research projects are completed and improves transparency by allowing collaborators or like-minded researchers to download and verify data analysis. Often DMPs enable researcher to think about the necessary privacy and compliance considerations, especially if the data handles human subjects research, or some other type of protected data. For these reasons, DMPs are often required as a part of research funding proposals.