Research Guides: Digital Humanities: Managing Data

Overview

What is Data Management?
Considerations and Best Practices

Data management refers to the continuous process of organizing, storing, preserving, and sharing research data in a systematic and effective manner. It involves making a plan to handle data according to best practices throughout the research data life cycle which covers various aspects such as data collection, storage, analysis, documentation, and dissemination.

Good data management is crucial for all researchers and not just digital humanists. Here we will limit our scope, providing an overview of Data Management Plans (DMPs) and covering data sharing and data curation specific to the humanities. For further details and resources on DMPs and to learn about the fundamentals of data management, including data documentation, organization, and security, please check out the Data Management for Sciences research guide.

UK Data Archive: Managing Data

Managing research data is centered around considerations for the collection, storage, and sharing of data throughout the research data life cycle:

Data protection: Researchers should avoid data loss and prevent breaches of privacy at all stages of the research process
Usability: Researchers should ensure easy and proper usage of the data preserved and shared.
Preservation: The data remain usable for the long-term future without corruption, degradation, decay, or loss.

From these considerations follow best practices:

Develop a Data Management Plan: Researchers should develop a plan for managing research data that outlines the processes and procedures for collecting, storing, and sharing data. You can learn more about formal Data Management Plans (DMPs) here. These are required by many funding agencies and ensures mindfulness throughout the research process and data life cycle.
Documentation: Researchers should practice comprehensive, well-written data and metadata documentation along with clear file organization. These are discussed below. Clear ownership and control of the data should be established, including any intellectual property rights associated with the data. Context should be detailing when sharing data to ensure reproducibility and proper usage by others.
Security: Researchers should perform regular data backups and implement appropriate data security measures to protect the data from unauthorized access or use.
Accessibility: Researchers should make the data available to other researchers through an appropriate repository that meets established data management standards while ensuring that appropriate data access policies are in place. Researchers should use standardized file formats that are widely accepted and supported to ensure that the data can be read and used by other researchers.

A data management plan (DMP) is a written document that outlines how research data will be collected, organized, stored, shared, and preserved over the course of a research project. Detailed, cost-effective DMPs are required by many funding agencies such as the National Science Foundation (NSF) and the National Institutes of Health (NIH) as part of the grant proposal submission process. A well-written DMP can help ensure that research data is handled in a way that is efficient, ethical, and compliant with institutional and funding agency policies.

The components of a data management plan may vary depending on the discipline, funder, and institution, but some common elements include:

Introduction: A brief description of the research project and the data that will be collected.
Data Collection: A description of the types of data that will be collected, including how it will be collected, who will collect it, and any necessary tools or equipment.
Data Organization: A plan for how the data will be organized, including file naming conventions, folder structures, and metadata.
Data Documentation: A plan for how the data will be documented, including how to describe the data and any associated materials or instruments.
Data Storage and Backup: A description of the storage location, backup procedures, and security measures for the data.
Data Sharing and Access: A plan for how the data will be shared and made accessible, including any embargo periods or restrictions on access.
Data Preservation: A plan for how the data will be preserved for the long-term, including any necessary metadata, data formats, and data curation.
Responsibilities and Resources: A description of the responsibilities of the research team members, and any resources required for successful data management.

This text was written with the help of ChatGPT, a machine learning large language model that generates text based on user input.

For a more in-depth look at DMPs and their components, please check out the DMP page on the Data Management for Sciences research guide.

A Data Management Plan "How-To" Guide from the Digital Curation Centre (DCC)
pdf version

The Office of Management and Budget (OMB) Circular A-110 provides the federal administrative requirements for grants and agreements with institutions of higher education, hospitals and other non-profit organizations. In 1999 Circular A-110 was revised to provide public access under some circumstances to research data through the Freedom of Information Act (FOIA).

Funding agencies have implemented the OMB requirement in various ways. The table below summarizes the data management and sharing requirements of primary US federal funding agencies.

US Federal Funding Agency	Policy and Guideline Status	More Information
National Endowment for the Humanities (NEH)	Beginning in January 2012, the NEH Office of Digital Humanities will offer Digital Humanities Implement Grants (DHIG). DHIG applicants will be required to submit a data management plan and a sustainability plan.	Digital Humanities Implementation Grant Guidelines General Terms and Conditions for Awards to Organizations
National Science Foundation (NSF)	“Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.” “Proposals submitted to NSF must include a supplementary document of no more than two pages labeled ‘Data Management Plan.’ This supplementary document should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results.”	Award & Administration Guide (AAG) Chapter VI.D.4 Grant Proposal Guide (GPG) Chapter II.C.2.j Data Management & Sharing Frequently Asked Questions (FAQs)
Institute of Museum and Library Services (IMLS)	“IMLS encourages sharing of research data.”	Specifications for Projects that Develop Digital Products, sect. III

Browse Data Sharing Requirements by Federal Agency Browse article and data sharing requirements by federal agency by SPARC and Johns Hopkins University Libraries
DMPTool Funder Based Templates Provides templates for data management plans based on the funder specific requirements

Data Sharing

Introduction
Humanities Data Resources

What is data sharing?

Data sharing typically means depositing data in repositories, which are used to store and preserve data so that researchers can access and analyze it. Other methods of data sharing include submitting data to a journal to support a publication and dissemination via a project or institutional website.

Why is it important?

Not only does sharing research data provide credit to the researchers and increase the impact and visibility of their research, it also maximizes transparency and accountability by allowing for validation and promoting reproducibility. It further encourages scientific inquiry and innovation through potential collaboration, new data uses, and education.

How is it related to data management?

Effective data management is important for promoting data sharing, as researchers are more likely to share their data if they have confidence that the data will be used and cited appropriately. Good data management practices can help to ensure that research data is properly attributed, cited, and acknowledged, which can in turn increase the impact and visibility of the research. In summary, data sharing is an important part of research and is facilitated by good data management practices.

EZID
A service for researchers and others to obtain and manage long-term identifiers for digital content including data, which makes digital objects easier to access and verify, thus increasing re-use and citations; UCLA researchers should email requests for EZIDs.

Data sharing is not a well-defined standard in humanities like it is in sciences. In particular, it is not as embedded in humanists' research and publication workflows. In general, humanities research is typically framed as interpreting sources rather than analyzing data, which can make the entire concept of “data sharing” seem foreign. While digital humanities may engage more with structured data, digital humanists have no clear consensus on how to define data sharing or its best practices and make up a relatively small portion of humanities researchers as a whole.

As there are few well-established humanities-specific domain repositories, those who do publish their data often self-publish on websites, and it is not common practice and therefore not a developed skill among humanities researchers to ask new research questions based on existing structured datasets, which would encourage data citing and reuse.

With the White House Office of Science and Technology Policy (OSTP) declaring 2023 the “year of open science” and specifically including the National Endowment for the Humanities (NEH) in its addressment of data sharing and open access, hopefully the humanities research community will work towards resolving this issue and building out data infrastructure.

Adapted from: Ruediger, D., & MacDougall, R. (2023, March 6). Are the Humanities Ready for Data Sharing? https://doi.org/10.18665/sr.318526

Below are some humanities-specific repositories or relevant repositories lists, and for general repositories take a look at the Data Deposit and Sharing on the Data Management for the Sciences.

Journal of Open Humanities Data
Great resource for published data-sets
Open Access Directory: Data Repositories
Lists open data repositories by subject, including archaeology, geosciences and geospatial data, and linguistics
Open Access Directory: Disciplinary Repositories
While most are primarily for texts and not structured data, disciplines listed include humanities in general, anthropology, arts and arts history, classics, literature, digital preservation, law, philosophy, and more
re3Data Subject Repositories
A global registry of research data repositories organized by academic discipline

Data Curation

There are a number of competing terms used to describe the activity of managing digital materials for research: digital curation, digital stewardship, data curation, digital archiving. As a compact and provisional definition, data curation is "the active and ongoing management of data throughout its entire lifecycle of interest and usefulness to scholarship" (Cragin et al. 2007). To expand on this definition, the above terms are defined as:

active and ongoing management: Data curators intervene in the research process in order to translate or migrate data into new formats, to enhance it through additional layers of context or markup, to create connections between data sets, and to otherwise ensure that data is maintained in as highly-functional a form as possible.
entire lifecycle: As we enter the era of thoroughly digital research, the full lifecycle of digital research data is still not yet known to us. However, we can anticipate that some data (particularly data collected through destructive means, such as archaeological data) will have a very long horizon of usefulness (in addition to increased evidentiary value for historical analysis and stewardship of our cultural heritage). The uses of data will likely change over time and with different stages of research.
interest and usefulness to scholarship: The term "scholarship" should be construed broadly, especially since data creation, use, and curation are not limited to the academy. Data curation seeks to retain the interest and usefulness of any data that has a serious purpose to fulfill.

Adapted from:

“An Introduction to Humanities Data Curation”
Julia Flanders, Center for Digital Scholarship, Brown University
Trevor Muñoz, University of Maryland

Digital Curation Centre
NDIIPP
Library of Congress
Digital Curation and Preservation Bibliography
Charles W. Bailey, Jr.
From Data Deluge to Data Curation
Philip Lord, Alison Macdonald, Liz Lyon, David Giaretta
International Journal of Digital Curation
Kevin Ashley, editor

In the humanities, data curation is especially important because the data can be complex and varied, ranging from textual sources and images to audio and video recordings. Moreover, humanities researchers often work with unique or rare materials that require special care and attention to ensure their preservation and accessibility. Although the field of humanities data is growing steadily, at present we can identify several major types of research objects and collections that present distinctive forms of data and distinctive curation challenges.

Scholarly editions
Text corpora
Text with markup
Thematic research collections
Data with accompanying analysis or annotation
Finding aids

In addition to these distinctive kinds of humanities data, there are also a few strategic points concerning the treatment of this data that need to be stressed:

The importance of interpretive layering:

Data in the humanities is often complex and may require contextualization or interpretation in order to be fully understood. This is particularly true for data such as texts or images, which may have multiple layers of meaning. In data curation, interpretive layering refers to the process of providing additional contextual information to help users understand the data. This may include information about the historical or cultural context of the data, the intended audience, or the interpretation of the data by the researcher. By including interpretive layering in data curation, researchers can ensure that data is more easily understood and more useful for a wider range of users.

The importance of information about how the data is captured and prepared:

In order to understand data fully, it is often necessary to understand how the data was collected and prepared. This may include information about the data collection process, such as the sampling methodology or the instrumentation used, or information about the data cleaning and preprocessing steps that were taken. By capturing this information in data curation, researchers can ensure that the data is more transparent and that users can better understand the data and any limitations or biases that may be present.

The importance of capturing responsibility, editorial voice, and debate:

Data in the humanities often involves multiple interpretations and perspectives. In order to fully capture the richness of the data, it is important to capture information about the responsible parties, such as the authors, editors, or curators, and to document any debates or discussions that may have taken place during the research process. By capturing responsibility, editorial voice, and debate in data curation, researchers can provide users with a more complete understanding of the data and its interpretation.

Adapted from:

ChatGPT and

“An Introduction to Humanities Data Curation”
Julia Flanders, Center for Digital Scholarship, Brown University
Trevor Muñoz, University of Maryland

Here are some examples of data curation in the humanities:

Locating Grid Technologies
Angela Piccini
The Virtual Observatory and Roman de la Rose
Sayeed Choudhury and Timothy L. Stinson

Purpose

Data Curation Profiles are designed to capture requirements for specific data generated by researchers as articulated by the researchers themselves. They are also intended to enable librarians and others to make informed decisions in working with data of this form, from this research area or sub-discipline. Data Curation Profiles employ a standardized set of fields to enable comparison. They are also designed to be flexible enough for use in any domain or discipline.

Context

A profile is based on the scientist/scholar’s reported needs and preferences for these data. They are derived from several sources of information, including interviews, documentation, publications or other relevant materials.

Scope

The scope of individual profiles will vary, based on the author’s and participating researcher’s background, experiences, and knowledge, as well as the materials available for analysis.

What can a Data Curation Profile be used for?

At an individual level, the Data Curation Profile:

provides a structure for conducting a data interview between an information professional and a researcher or research group
provides a means for a researcher or a research group to thoughtfully consider their needs for the data beyond its immediate use

At an institutional level, the Data Curation Profile:

can serve as a foundational document to guide the management and/or curation of a particular data set
can be shared with staff providing data services and others to inform them of the researcher's needs and ensure that everyone is on the same page
may be used to inform the development of data services to be offered by the institution, as well as to help identify the types of tools, infrastructure and responsibilities for data services staff

At the broadest level, the Data Curation Profile:

may be used by others as a guide in developing data services at their own institutions
may be used as objects of research to further a better understanding of data types that researchers want or need to share, curate or preserve as well as the needs of researchers in doing so

Examples

The following examples were created using the Data Curation Profiles Toolkit created by Purdue University.