Skip to main content
Chat With Us

Text & Data Mining


HathiTrust Research Center

This guide is intended to help researchers and librarians find the content, tools, training and other assistance available to engage in successful text mining research at Boston College.

HTRC elephant head logoThe HathiTrust Research Center (HTRC) is the research arm of HathiTrust.  It facilitates scholarly research using the large-scale HathiTrust Digital Library by providing mechanisms for researchers to access content in the HathiTrust and study it using computational tools for text analysis.

Entire Collection Piloted for TDM

 The HathiTrust Research Center has expanded its services to support computational research on the entire collection of one of the world’s largest digital libraries, held by HathiTrust. HathiTrust’s collections include over 14 million digitized volumes, including more than 7 million books, 725,000 US federal government documents, and 350,000 serial publications. Previously the HathiTrust Research Center supported analysis of only the public domain subset of the HathiTrust collection. Researchers will now be able to explore the entire collection and run an algorithm against all 14 million volumes. The change is being piloted in 2016 and is expected to be more widely available in 2017.

[HathiTrust Press Release]


HTRC provides extensive documentation on the Tools, including instruction videos, tutorials, presentations, examples and Getting Started FAQs.

Create an account

Most of the HTRC services require an account to log in and interact with the tools. Register for an account by going to the Portal and choosing "Sign up" from the menu. Anyone using an email address from a nonprofit institution of higher education is allowed to register, including those whose institutions are not HathiTrust members. 


The HTRC has created a suite of tools that allow researchers to perform text analysis on content in the HathiTrust Digital Library. These tools include: