Text & Data Mining

Overview

This guide is intended to help researchers and librarians find the content, tools, training and other assistance available to engage in successful text mining research at Boston College.

What is text and data mining?

Text and data mining (TDM) is the computational analysis of vast quantities of digital information, whether free-form natural language text or structured data.

Using specialized software, researchers can extract data, identify trends, look for patterns and better understand the relationships of terms within and between documents. Analysis might focus on word frequency, words that frequently appear near each other, contextual information for key words, common phrases and other patterns.

Materials to be analyzed range from websites (such as publicly available Facebook posts), 16th C. manuscripts, DNA sequences, to old newspapers.

Policies for Mining Licensed Content

If you wish to undertake a text or data mining project with content from the Libraries’ licensed databases, please contact a Subject Librarian to investigate options, which may include negotiating with the vendor or purchasing access to the data. Although many database licenses prohibit text and data mining and the use of software such as scripts, agents, or robots, we are actively negotiating text mining rights with database vendors. Unauthorized text or data mining in violation of our licenses can result in loss of access for the entire Boston College community.

How we can help?

You can begin your inquiry with your Subject Librarian. They can help you find and interpret the terms and conditions that apply to resources you might want to mine.

The subject librarian may also refer you to a Digital Scholarship specialist for help with planning your TDM project and process. If your questions are primarily about tools and techniques, you can set up a consultation with the Digital Scholarship librarians directly.

This is a graphic analysis, constructed using Voyant, of the frequency of terms in the novel, Agnes Grey, by Charlotte Bronte.

Lit & News Feed

Learn More

Data Mining and Text Analysis
From the Intro to Digital Humanities Libguide (UCLA Center for Digital Humanities)
Text Mining and Scholarly Publishing
(Jonathan Clark, Publishing Research Consortium, 2012)
Seven Ways Humanists are Using Computers to Understand Text
Glossary of Digital Humanities Terms

O'Neill Library

Bapst Library

Burns Library

Educational Resource Center

Law Library

Social Work Library

Theology & Ministry Library

O'Connor Library

Institute for Advanced Jesuit Studies

Text & Data Mining

Overview

What is text and data mining?

Policies for Mining Licensed Content

How we can help?

Lit & News Feed

Learn More