Data Visualization is one of the most critical aspect of research. It allows researchers to see patterns and trends in data that are not easily observable when looking at raw information. Over the last two decades, technology has dramatically advanced the scale at which data can be visualized. In addition, more public data is available than ever before, allowing for both researchers and citizen scientists to create excellent visual stories with their data.
There are two different approaches to data visualization:
Exploratory: Your purpose for visualization is the exploration and understanding of trends in your data, either for the purpose of data analysis or the preparation of it.
Explanatory: Your purpose for visualization is to clearly and effectively communicate something about your data to a wider audience.
With a focus on explanatory data visualization, this guide is designed to help users with finding datasets and resources, outline general approaches to data visualization for the purpose of effective communication, and provide a list of tools useful for creating beautiful visualizations.
A good example illustrating the significance of data visualization is Anscombe's Quartet, where four different datasets have the same simple statistics but look very different visually.
These datasets have the same average and variance for their x and y variables, the same correlation between them, and the same linear regression line (with an accuracy of at least two decimal points). Thus they would be basically indistinguishable from one another when compared through a table of these simple descriptive summary statistics.
However, this is actually really misleading because the datasets have very different distributions from one another, which becomes immediately apparent when they are graphed. Visualizations are powerful because they allow us to quickly and intuitively understand data and patterns that they hold.
Source: Anscombe, Francis J. (1973) Graphs in statistical analysis. American Statistician, 27, 17–21
Data repositories contain published datasets that are typically associated with publications or ongoing research projects. Data repositories are used to store and preserve data so that researchers can access and analyze it.
There are two types of repositories we will discuss: scholarly and public.
Scholarly: Scholarly data repositories are managed by organizations or scientific societies and often have stricter guidelines on the format and level of detail in submissions. They generally are well-maintained, containing data sets from well-controlled studies, and that include detailed descriptions and metadata. Access to these repositories may be restricted.
When searching for data in scholarly repositories, be sure to check to see if it is associated with a research publication. Make sure there is enough information in the record to be sure you can reuse the data correctly.
Here are some scholarly repositories:
And here are a list of public repositories:
When collecting data, there are two main considerations.
First, research data sometimes has to be purchased and/or used under strict terms of agreement, or following specific privacy protocols. When purchasing data sets, or downloading protected data, be sure the data is stored in a safe and secure environment. It's important to respect copyright permissions and understand what constitutes fair use.
Second, carefully looking into the context and content of the data can help you understand any potential biases or limitations to prevent misuse. Data that is published is often associated with specific experimental strategies. While the strategies and limitations are often discussed when data are published with research papers, this information is not always available with the data set. When selecting a data set for use: