Skip to Main Content

Text Analysis and Visualization

Resources for workshops on text analysis and visualization

Librarian for Digital Research and Scholarship

Profile Photo
Zoe Borovsky, Ph.D.
Contact:
phone: 310-825-4954
https://ucla.zoom.us/my/zoepster

Introduction to Text Analysis: Objectives

1. In this workshop students will begin exploring text analysis using a freely available online visualization tool (IBM's ManyEyes).  

2. They will find machine readable texts on the web and from a repository of etexts: Project Gutenberg.  They will learn how to use text analysis to compare two (or more) texts.  

3. Students will compare the functionality of ManyEyes with Voyant, a text-analysis tool designed by and for digital humanities researchers. 

Example: Wordle

Simple text-analysis visualizations allow us to make comparisons.  For example, Crystal Smith explores whether language in television ads reinforces gender stereotypes. See the comparison (and judge for yourself)  here.  

Wordle: Words Used in Advertising for Girls' Toys
Wordle: Words Used to Advertise Boys' Toys

Tools

Data Sets

Websites, such as UC Santa Barbara's American Presidency Project, can be excellent sources of machine-readable texts.  

You can copy and paste the texts you wish to analyze. 

Or, this downloadable zip file contains eight individual files taken from the State of the Union Addresses on that website: 

  • Barack Obama: 2009, 2010, 2011, 2013
  • George W. Bush: 2005, 2006, 2007, 2008

Step-by-Step

1. ManyEyes.  After logging in to ManyEyes we will create four visualizations using one State of the Union Address.  

Try Word Cloud Generator, Word Tree, Phrase Net, and Tag Cloud using just one text (e.g. 2008).

Next, we will prepare a special file that, in ManyEyes, allows us to compare two texts using the Tag Cloud.

You might for example try comparing Geoge W. Bush (2008) with Barack Obama (2009).  

   ( Instructions for preparing the files are here. ) 

We could, of course, use ManyEyes to compare four of Bush's speeches with four of Obama's, but we would not be able to view each of those eight texts individually.  The Tag Cloud only allows a comparison of two texts: all of Bush's speeches with all of Obama's.  

2. Voyant.  Voyeur accepts a URL (of a web-based text), or uploaded files as plain-text, HTML, XML, and (some) PDFs.

Prepare the texts:  You can download this zipped file, unzip it, and upload each of the eight files into Voyant.  

Voyant will present the texts in the order you upload them. 

You will see something like this, and be able to compare words across all eight files.  

SOU2005_2012