Reading Thomas Jefferson with TopicViz: Towards a Thematic Method for Exploring Large Cultural Archives

  • Lauren Klein School of Literature, Media, and Communication, Georgia Institute of Technology
  • Jacob Eisenstein School of Interactive Computing, Georgia Institute of Technology
Keywords: archives, reading, research, topic modeling, search, visualization, Thomas Jefferson

Abstract

In spite of what Ed Folsom has called the “epic transformation of archives,” referring to the shift from print to digital archival form, methods for exploring these digitized collections remain underdeveloped. One method prompted by digitization is the application of automated text mining techniques such as topic modeling -- a computational method for identifying the themes that recur across an archive of documents. We review the nascent literature on topic modeling of literary archives, and present a case study, applying a topic model to the Papers of Thomas Jefferson. The lessons from this work suggest that the way forward is to provide scholars with more holistic support for visualization and exploration of topic model output, while integrating topic models with more traditional workflows oriented around assembling and refining sets of relevant documents. We describe our ongoing effort to develop a novel software system that implements these ideas.

Author Biographies

Lauren Klein, School of Literature, Media, and Communication, Georgia Institute of Technology
Assistant Professor, School of Literature, Media, and Communication
Jacob Eisenstein, School of Interactive Computing, Georgia Institute of Technology
Assistant Professor, School of Interactive Computing 
Published
2013-12-16
Section
Articles