Skip to Main Content

Manage Your Research Data: Methods

There are many techniques for mining and analysing text data. The selection of techniques will depend on the goal of the project.

When analysing text data, using multiple techniques together can help mitigate the limitations of each individual method and provide a richer understanding of the data.

For example, you might start with a word frequency count to identify the most common words in the text. Then, you can use collocation to see which words often appear together, providing insights into common phrases and context. Finally, a concordance search for important terms can show the specific sentences and contexts in which these terms are used, revealing multiple meanings and relationships with other words.

By combining these techniques, you get a detailed picture of how important terms are used, their meanings, and their relationships with other words. This comprehensive approach helps avoid the downsides of using a single method and provides deeper insights into the text data.

Here are some common techniques:

This technique involves counting how often each word appears in a text.

If you have a book, you can use word frequency to find out which words are used the most. This helps you understand the main topics or themes of the book.

This process involves splitting a large piece of text into smaller units called "tokens." These tokens can be individual words or phrases.

Imagine you have a long paragraph. Tokenisation breaks it down into manageable pieces, like separating each word. This makes it easier to count how often each word appears, analyse the structure of sentences, or perform other types of text analysis.

This technique measures how often certain words appear together in a text.

If you analyse a collection of articles, collocation can help you find common phrases like "climate change" or "economic growth," which can give insights into topics.

This technique locates a specific word in a text and shows the context in which it is used.

If you want to study how the word "freedom" is used in a speech, concordance will show you each instance of the word and the surrounding words, helping you understand its different uses.

This technique identifies common sequences of words, such as pairs (bigrams), triplets (trigrams), etc.

In analysing surveys, n-grams can help you find common phrases like "great service" or "highly recommend," which can be useful for understanding customer opinions.

This technique determines the sentiment expressed in a piece of text, whether it is positive, negative, or neutral.

In analysing medical surveys, sentiment analysis helps gauge overall patient feelings, such as whether respondents are satisfied or dissatisfied with their healthcare experience.

This technique identifies and classifies entities like names, locations, dates, and other important items in a text.

In researching bioclimate, NER can help extract key information such as the names of species, geographic locations, and dates of climate events. This is useful for analysing scientific papers or reports to gather relevant data on bioclimate patterns and changes.

This technique discovers abstract topics within a collection of documents.

If you have a large set of research papers, topic modeling can help you identify the main themes or subjects covered in those papers.

This technique assigns predefined categories to text.

In an email system, text classification can automatically sort emails into categories like "spam" or "important."

This technique groups similar texts together without predefined categories.

In market research, clustering can help you identify different types of customer feedback, such as grouping similar complaints or praises together.

This technique labels words in a text as nouns, verbs, adjectives, etc., based on their role in the sentence.

In analysing a sentence, part of speech tagging helps you understand the grammatical structure, which is useful for further linguistic analysis.

This technique analyses the grammatical structure of a sentence to understand the relationships between words.

In machine translation, dependency parsing helps ensure that the translated sentence maintains the correct grammatical relationships.

 

Text Mining chart