This tutorial was written by Katherine Walden, Digital Liberal Arts Specialist at Grinnell College. Tutorial instructions were co-authored by Sarah Purcell (L.F. Parker Professor of History, Grinnell College) and Papa Ampim-Darko, a student research assistant at Grinnell College.
This tutorial was reviewed by Gina Donovan (Instructional Technologist, Grinnell College).
This tutorial is adapted from Doing Digital History’s Voyant tutorial.
Introduction to Textual Analysis (Voyant Tools) is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Voyant Tools is an open-source web application developed by Stéfan Sinclair and Geoffrey Rockwell in 2003, with later contributions added by Andrew MacDonald, Cyril Briquet, Lisa Goddard, and Mark Turcato. While Voyant is one of the leading robust web-based textual analysis interfaces, it grew out of existing text analysis tools like HyperPo, Tapoware, and TACT. Voyant also offers open-source code that can be used to deploy the program on a server. Voyant users can upload text files from their computer, link to online text sources, or scrape the text off a webpage for analysis and visualization. Unlike more advanced, programming-oriented textual analysis programs like R and R Studio, Voyant gives users access to a range statistical analysis and visualization features without requiring significant technical knowledge.
1-Navigate to https://dataweek.sites.grinnell.edu/files/MN_Trans_OralHistories_ZIP.zip in an internet browser to download the data set we’ll be working with in this tutorial.
Right click on the folder you downloaded to extract the contents. Copy the extracted folder to your Desktop.
About this data (from the University of Minnesota’s Tretter Transgender Oral History Project):
The Tretter Transgender Oral History Project is part of the Jean-Nickolaus Tretter Collection in GLBT Studies at the University of Minnesota Libraries. Transgender voices and experiences are often missing in contemporary documentation and the historic record. The goal of this Project is to empower individuals to tell their story, while providing students, historians, and the public with a richer foundation of primary source material about the Transgender community. Materials are housed within the Tretter Collection.
Phase 1 of this Project (2015-2018) focused on documenting the experience of transgender and gender queer people in the Upper Midwest. Oral Historian Andrea Jenkins conducted 200 interviews covering identity, family, love and experiences. These oral histories are posted online. There is also an online exhibit about Phase 1.
2-Open a web browser (preferably Firefox or Chrome) and navigate to the Voyant Tools homepage.
3-Upload the files in the “cleaned_txt_files” folder and click Reveal.
4-Once a text or corpus has been uploaded, Voyant moves into its ‘default skin,’ or primary editing environment.
5-Without clicking on any of the page elements, scan the page.
What do you immediately notice after uploading a text to Voyant?
What stands out, or catches your attention?
What types of information are contained in this page?
What do you have questions about, or what is confusing and not immediately clear?
Based on your initial scan, what function do you think these various components serve?
Editing in Voyant
6-In the default editing environment, Voyant displays five panels: Cirrus, Reader, Trends, Summary, and Contexts.
7-Each panel is a tool, and all the panel tools interact with each other. Modifying or interacting with one tool will update the others.
8-Each panel tool offers additional display and export options, which can be accessed by placing your cursor over the ? symbol. From left to right:
- Export—opens a pop-up window with export and sharing options.
- Choose another tool—opens a dropdown with other tools that be displayed in the panel space.
- Options—opens a pop-up window with additional options for a specific tool.
- Additional information—provides a description of the tool’s functionality and possible uses.
9-Most tool panels in Voyant also include a search bar.
10-Cirrus (top left-hand corner of the page): Cirrus visualizes a word cloud of words contained in the uploaded text. Words that are larger in size and closer to the center of the visualization appear with greater frequency in the text.
11-The default Cirrus view displays a visualization of word count calculation. Clicking on the table icon labeled Terms switches from the word cloud to a table view of word counts and frequency trends.
12-The Links icon visualizes relationships between words by presenting a network visualization of connected terms, or words that appear in relation or proximity to each other.
13-What differences do you notice across the three views for this panel? What words or features stood out in each representation of the textual analysis? How did you understand the textual analysis differently based on how it was presented? What questions do you have about how this tool calculated these results? What questions do you have about the textual data after exploring this tool?
14-Reader (top middle of the page): Reader is a text reader that displays the original text. Moving the mouse over a word in the reader will highlight the word and display its frequency count.
15-Like with Cirrus, Reader offers additional panel views. Clicking on the circle icon labeled TermsBerry switches the panel to a visualization of term frequency and proximity. Hovering over a bubble with highlight the term, calculate the number of times it appears in the text, and indicate what terms most frequently appear to the right and left of the selected term in the text.
16-What differences do you notice across the two views for this panel? What words or features stood out in each representation of the textual analysis? How did you understand the textual analysis differently based on how it was presented? What questions do you have about how this tool calculated these results? What questions do you have about the textual data after exploring this tool?
17-Trends uses a line graph visualization to show the distribution of terms and relative term frequency across the text. The legend will define the most frequently-appearing terms in the text, and the graph illustrates where and how frequently they appear in the text. Hovering over a graph point highlights a term, displays its relative frequency calculation, as well as the source text for that calculation.
18-Clicking on the table icon labeled Document Terms shows the frequency and proportion counts as a data table.
19-Summary provides information about the corpus, including the number of documents, words, and unique terms. Summary also calculates document length, vocabulary density, average words per sentence, and most frequently-occurring terms.
20-Clicking on the table icon labeled Documents shows the word count, unique term, word count ratio, and words per sentence calculated data in tabular form.
21-Clicking on the table icon labeled Phrases shows the number of times a particular phrase appears in the text, as well as the phrase length and document location.
22-What differences do you notice across the Trends and Summary panels? What words or features stood out in each representation of the textual analysis? How did you understand the textual analysis differently based on how it was presented? What questions do you have about how this tool calculated these results? What questions do you have about the textual data after exploring this tool?
23-Contexts identifies the most frequently-occurring terms and shows the text that appears to the left and right of the term in the text.
24-Clicking on the table icon labeled Correlations shows how frequently a specific term appear next to or near another term by calculating correlation and significance measures.
25-What differences do you notice across the three views for this panel? What words or features stood out in each representation of the textual analysis? How did you understand the textual analysis differently based on how it was presented? What questions do you have about how this tool calculated these results? What questions do you have about the textual data after exploring this tool?
Interacting with Voyant Tools
26-Up to this point, we have been interacting with each panel tool individually. Now we will explore how the panel tools interact.
27-Select a panel tool, and click on a term, data point, or other interactive element within the panel tool. What changes in that panel after you click? What changes happen in the other panels? How does the analysis in each tool change based on how you interact with the data analysis and visualization?
28-Refresh the browser page, and select another panel tool. Use the search function to look for a term in the text. What changes in that panel after you search? What changes happen in the other panels? How does the analysis in each tool change based on how you interact with the search function?
29-What do you find useful about the interactive analysis and visualization tools? What do you find frustrating, unclear, or confusing? How does interacting with a tool change or impact your understanding of the text? What additional questions do you have about the text (or tool)?
Exporting in Voyant
30-We briefly mentioned export options when introducing you to the editing interface. Voyant gives you the option to share an entire editing interface view. Move your cursor over the question mark icon in the top-right hand corner of the page to explore this option.
31-Voyant also gives you the option to export or share the data or visualization created in a specific panel tool. Again, move your cursor over the question mark icon in the top right-hand corner of a specific panel to explore this option.
Final reflection questions:
- What did you find engaging or interesting about Voyant Tools?
- What types of research questions could a researcher have about a text or collection of texts?
- How effectively might Voyant be able to address those questions?
- What challenges, frustrations, or limitations did you encounter while using Voyant?
- What remaining questions do you have about this data?