There are many techniques for mining and analysing text data, The selection of which will depend on the goal of the project.
Word frequency: a list of all words contained in the text and their frequency of use.
Collocation: the frequency of which words occur together
Concordance: locates a given word and the context it is used in
N-grams: common sets of 2,3,4 etc word clusters
Part of speech tagging: Tag words as being nouns, adjectives, verbs, ect. based on its location and definition
Named entity recognition: Identifies names, locations and dates etc.
Often these methods are used in combination for a given text data set to avoid the downsides of each. For example a word frequency count may be carried out followed by a collocation and concordance search of important terms. This provides important information on how the word is being used, and if the word has multiple meanings within the text as well as if it is often linked with other terms.
Tags text according to various topics enabling a structure and classification system to evolve in the text. This makes use of some of the techniques of Natural Language Processing (NLP)
Topic analysis: Tags the main themes or topics of a text
Sentiment analysis: Tags the underlying emotion of a piece of text
Language detection: identify pieces of text that may be in another language
Intent detection: identifies the intent behind the text. For example if you are analysing a data set consisting of emails, is a given email providing feedback or requesting information?
The true power of text mining comes not from performing analysis on single texts but from the ability to use analysis to generate new information and insights from larger sets of text data. These insights would be difficult to find through slow reading. There are many possible sources of text data, below is a sample of some possible sources.
All Elsevier journals and books are able to be text data mined. The Elsevier API enables researchers to bulk download the content they would like to analyse allowing greater efficiency and consistency in the process. Find out more information about the API and links to the developer portal where you may download the API for non commercial research on the Elsevier text and data mining page. The API may also be used to mine metadata such as titles and abstracts indexed in the Scopus database.
Text mining within PubMed has been made possible with the availability of many free tools that make seeing the interactions and links between terms within research easy. Enabling the researcher to gain insights and see patterns in their area of research.
Generates a Venn diagram and list of articles based on search terms and articles listed in PubMed.
Ranks the frequency of words and terms found in abstracts and titles for articles indexed in PubMed in a table display. Other frequency displays include the journals and authors most associated with the terms. You are also able to lookup human gene names.
Requires a free account to use. Results are displayed on a dashboard showing relationships in a graphic network.
Enter free text to highlight and identify MeSH terms, PubMed articles that are identified as being similar to the text will also be displayed.
Visualise patterns and relationships within a bibliometric network. Featuring the ability to form networks based on authors, journals and other bibliometric details as well as text mining for the relationships between terms.
Input includes HTML, .txt, .pdf, and Word documents, no programing experience is required. Voyant analyses your text and provides an analysis in a dashboard format with interactivity between the various tools. This free web based tool is a text reading and analysis environment, enabling visualisation and interpretation of text for scholarly purposes.
Drag and drop documents into the analyser provided by JSTOR, with several types of formats accepted for input. Analyses the text for key terms and topics and suggest related articles from JSTOR.
A text preparation and analysis tool enabling you to clean, scrub and section your text before analysis. You are then able to perform cluster analysis and visualisations from your data.
Edith Cowan University acknowledges and respects the Nyoongar people, who are
the traditional custodians of the land upon which its campuses stand and its programs
operate.
In particular ECU pays its respects to the Elders, past and present, of the Nyoongar
people, and embrace their culture, wisdom and knowledge.