Topic modelling

Topic modelling is a type of statistical method used to discover the latent topics that occur in a large collection of documents. It's particularly useful in the fields of text mining and natural language processing (NLP), and has gained significant traction in both the social sciences and DH for the analysis of large textual datasets. It allows researchers to categorise, summarise and understand large bodies of text in a way that would be time-consuming or impossible to achieve through manual analysis. This tool has been used to discover and visualise patterns and themes in a range of documents including poems, novels, newspapers and diaries.

An unsupervised algorithm - specific topics are not predetermined - processes the data to identify clusters of words (topics) according to their co-occurrence within documents. It can provide a structured way of understanding the thematic underpinnings of the corpus.


Burkina Faso

Python Code