Skip to Main Content

Text and Data Mining

A guide that goes over the basics of text and data mining.

What is Text and Data Mining

Text and Data Mining (TDM) refers to automated research techniques and strategies, designed to identify trends and patterns in large sets of data. (Text mining is truly just a certain flavor of data mining, focusing on pattern founds in large data sets comprised of text.)

These techniques are increasingly finding useful applications in a variety of disciplines, allowing for findings not feasibly possible through traditional methods of analysis.  For instance, researchers in Digital Humanities have been able to broaden their textual analysis of wide swaths of literature, while scientists can speed analysis of data and text to hasten the progress of science.

Can I use TDM on Library Resources?

The UCLA Library regularly licenses access to scholarly resources across the disciplines for the entire University community.  This often comes in the form of electronic access to commercial databases, usually arranging the content of scholarly journals, electronic books, and data sets.

Some of these commercial vendors are allowing our users to access their data for TDM purposes in the terms of our license.  These resources are listed in a specific section of this guide.

While we are always striving to expand access to accommodate these valid TDM uses of the content, most of our current licenses do not allow for systematic downloading of content for TDM purposes.  Commercial vendors actively monitor database activity to detect when users are downloading large amounts of text or data in systematic ways.  When UCLA users are detected doing so, it can trigger a “breach” of our license terms and suspend access for the entire University community.

Besides the exceptions mentioned above, TDM projects hoping to utilize licensed library resources require special arrangements, often involving additional fees to arrange access to the data, either via unique access to the data via API or by arranging for a download of the data for limited purposes.  If you are interested in initiating such a project, please contact a librarian in Scholarly Communication and Licensing to begin exploring the possibility.

Examples of Projects Completed using TDM

Using specialized software, researchers can extract data, identify trends, look for patterns and better understand the relationships of terms within and between documents. Analysis might focus on word frequency, words that frequently appear near each other, contextual information for key words, common phrases and other patterns.  TDM offers a wide array of research possibliites.

Boston College Libraries created a list of projects that used text and data mining methods. This would be a good starting point for researchers curious as to how TDM can work for them.  Many of these projects also applied other computational and quantitative methods as well as visualizations.