Data management refers to the continuous process of organizing, storing, preserving, and sharing research data in a systematic and effective manner. It involves making a plan to handle data according to best practices throughout the research data life cycle which covers various aspects such as data collection, storage, analysis, documentation, and dissemination.
Good data management is crucial for all researchers and not just digital humanists. Here we will limit our scope, providing an overview of Data Management Plans (DMPs) and covering data sharing and data curation specific to the humanities. For further details and resources on DMPs and to learn about the fundamentals of data management, including data documentation, organization, and security, please check out the Data Management for Sciences research guide.
Managing research data is centered around considerations for the collection, storage, and sharing of data throughout the research data life cycle:
From these considerations follow best practices:
A data management plan (DMP) is a written document that outlines how research data will be collected, organized, stored, shared, and preserved over the course of a research project. Detailed, cost-effective DMPs are required by many funding agencies such as the National Science Foundation (NSF) and the National Institutes of Health (NIH) as part of the grant proposal submission process. A well-written DMP can help ensure that research data is handled in a way that is efficient, ethical, and compliant with institutional and funding agency policies.
The components of a data management plan may vary depending on the discipline, funder, and institution, but some common elements include:
Introduction: A brief description of the research project and the data that will be collected.
Data Collection: A description of the types of data that will be collected, including how it will be collected, who will collect it, and any necessary tools or equipment.
Data Organization: A plan for how the data will be organized, including file naming conventions, folder structures, and metadata.
Data Documentation: A plan for how the data will be documented, including how to describe the data and any associated materials or instruments.
Data Storage and Backup: A description of the storage location, backup procedures, and security measures for the data.
Data Sharing and Access: A plan for how the data will be shared and made accessible, including any embargo periods or restrictions on access.
Data Preservation: A plan for how the data will be preserved for the long-term, including any necessary metadata, data formats, and data curation.
Responsibilities and Resources: A description of the responsibilities of the research team members, and any resources required for successful data management.
This text was written with the help of ChatGPT, a machine learning large language model that generates text based on user input.
For a more in-depth look at DMPs and their components, please check out the DMP page on the Data Management for Sciences research guide.
The Office of Management and Budget (OMB) Circular A-110 provides the federal administrative requirements for grants and agreements with institutions of higher education, hospitals and other non-profit organizations. In 1999 Circular A-110 was revised to provide public access under some circumstances to research data through the Freedom of Information Act (FOIA).
Funding agencies have implemented the OMB requirement in various ways. The table below summarizes the data management and sharing requirements of primary US federal funding agencies.
US Federal Funding Agency | Policy and Guideline Status | More Information |
National Endowment for the Humanities (NEH) | Beginning in January 2012, the NEH Office of Digital Humanities will offer Digital Humanities Implement Grants (DHIG). DHIG applicants will be required to submit a data management plan and a sustainability plan. | |
National Science Foundation (NSF) | “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.” “Proposals submitted to NSF must include a supplementary document of no more than two pages labeled ‘Data Management Plan.’ This supplementary document should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results.” |
|
Institute of Museum and Library Services (IMLS) | “IMLS encourages sharing of research data.” |
What is data sharing?
Data sharing typically means depositing data in repositories, which are used to store and preserve data so that researchers can access and analyze it. Other methods of data sharing include submitting data to a journal to support a publication and dissemination via a project or institutional website.
Why is it important?
Not only does sharing research data provide credit to the researchers and increase the impact and visibility of their research, it also maximizes transparency and accountability by allowing for validation and promoting reproducibility. It further encourages scientific inquiry and innovation through potential collaboration, new data uses, and education.
How is it related to data management?
Effective data management is important for promoting data sharing, as researchers are more likely to share their data if they have confidence that the data will be used and cited appropriately. Good data management practices can help to ensure that research data is properly attributed, cited, and acknowledged, which can in turn increase the impact and visibility of the research. In summary, data sharing is an important part of research and is facilitated by good data management practices.
Data sharing is not a well-defined standard in humanities like it is in sciences. In particular, it is not as embedded in humanists' research and publication workflows. In general, humanities research is typically framed as interpreting sources rather than analyzing data, which can make the entire concept of “data sharing” seem foreign. While digital humanities may engage more with structured data, digital humanists have no clear consensus on how to define data sharing or its best practices and make up a relatively small portion of humanities researchers as a whole.
As there are few well-established humanities-specific domain repositories, those who do publish their data often self-publish on websites, and it is not common practice and therefore not a developed skill among humanities researchers to ask new research questions based on existing structured datasets, which would encourage data citing and reuse.
With the White House Office of Science and Technology Policy (OSTP) declaring 2023 the “year of open science” and specifically including the National Endowment for the Humanities (NEH) in its addressment of data sharing and open access, hopefully the humanities research community will work towards resolving this issue and building out data infrastructure.
Adapted from: Ruediger, D., & MacDougall, R. (2023, March 6). Are the Humanities Ready for Data Sharing? https://doi.org/10.18665/sr.318526
Below are some humanities-specific repositories or relevant repositories lists, and for general repositories take a look at the Data Deposit and Sharing on the Data Management for the Sciences.
There are a number of competing terms used to describe the activity of managing digital materials for research: digital curation, digital stewardship, data curation, digital archiving. As a compact and provisional definition, data curation is "the active and ongoing management of data throughout its entire lifecycle of interest and usefulness to scholarship" (Cragin et al. 2007). To expand on this definition, the above terms are defined as:
Adapted from:
“An Introduction to Humanities Data Curation”
Julia Flanders, Center for Digital Scholarship, Brown University
Trevor Muñoz, University of Maryland
In the humanities, data curation is especially important because the data can be complex and varied, ranging from textual sources and images to audio and video recordings. Moreover, humanities researchers often work with unique or rare materials that require special care and attention to ensure their preservation and accessibility. Although the field of humanities data is growing steadily, at present we can identify several major types of research objects and collections that present distinctive forms of data and distinctive curation challenges.
In addition to these distinctive kinds of humanities data, there are also a few strategic points concerning the treatment of this data that need to be stressed:
Data in the humanities is often complex and may require contextualization or interpretation in order to be fully understood. This is particularly true for data such as texts or images, which may have multiple layers of meaning. In data curation, interpretive layering refers to the process of providing additional contextual information to help users understand the data. This may include information about the historical or cultural context of the data, the intended audience, or the interpretation of the data by the researcher. By including interpretive layering in data curation, researchers can ensure that data is more easily understood and more useful for a wider range of users.
In order to understand data fully, it is often necessary to understand how the data was collected and prepared. This may include information about the data collection process, such as the sampling methodology or the instrumentation used, or information about the data cleaning and preprocessing steps that were taken. By capturing this information in data curation, researchers can ensure that the data is more transparent and that users can better understand the data and any limitations or biases that may be present.
Data in the humanities often involves multiple interpretations and perspectives. In order to fully capture the richness of the data, it is important to capture information about the responsible parties, such as the authors, editors, or curators, and to document any debates or discussions that may have taken place during the research process. By capturing responsibility, editorial voice, and debate in data curation, researchers can provide users with a more complete understanding of the data and its interpretation.
Adapted from:
ChatGPT and
“An Introduction to Humanities Data Curation”
Julia Flanders, Center for Digital Scholarship, Brown University
Trevor Muñoz, University of Maryland
Here are some examples of data curation in the humanities:
Data Curation Profiles are designed to capture requirements for specific data generated by researchers as articulated by the researchers themselves. They are also intended to enable librarians and others to make informed decisions in working with data of this form, from this research area or sub-discipline. Data Curation Profiles employ a standardized set of fields to enable comparison. They are also designed to be flexible enough for use in any domain or discipline.
A profile is based on the scientist/scholar’s reported needs and preferences for these data. They are derived from several sources of information, including interviews, documentation, publications or other relevant materials.
The scope of individual profiles will vary, based on the author’s and participating researcher’s background, experiences, and knowledge, as well as the materials available for analysis.
At an individual level, the Data Curation Profile:
At an institutional level, the Data Curation Profile:
At the broadest level, the Data Curation Profile:
The following examples were created using the Data Curation Profiles Toolkit created by Purdue University.