Skip to main content

Text & Data Mining



What is text and data mining?

Text and data mining (TDM) is the computational analysis of vast quantities of digital information, whether free-form natural language text or structured data. 

Using specialized software, researchers can extract data, identify trends, look for patterns and better understand the relationships of terms within and between documents. Analysis might focus on word frequency, words that frequently appear near each other, contextual information for key words, common phrases and other patterns. 

Materials to be analyzed range from websites (publicly available Facebook content), 16th C. manuscripts, DNA sequences, to old newspapers.

How we can help?

You can begin your inquiry with your Subject Librarian. They can help you find and interpret the terms and conditions that apply to resources you might want to mine.

The subject librarian may also refer you to a Digital Scholarship specialist for help with planning your TDM project and process. If your questions are primarily about tools and techniques, you can set up a consultation with the Digital Scholarship librarians directly.

graph of term frequency

This is a graphic analysis, constructed using Voyant, of the frequency of terms in the novel, Agnes Grey, by Charlotte Bronte.

Lit & News Feed


Learn More