Dynamic topic modeling

Workflow Overview

This document describes the process for performing dynamic topic analysis on the Founders Online texts using the keyATM package in R. The analysis focuses on understanding how the prevalence of topics evolves over time. The workflow involves loading necessary libraries, preparing data, creating a dynamic topic model, and evaluating the results.

Step-by-Step Guide

1. Script Header

Description: This script performs a dynamic topic analysis on the Founders Online texts, analyzing how the prevalence of topics changes over time. It uses the keyATM package to fit a dynamic keyATM model to the cleaned text data.

Purpose:

  • Understand the evolution of topics over time.
  • Provide an alternative to the shifting concepts in time (shico) approach.

References:

2. Libraries

Load the necessary R libraries:

  • keyATM for dynamic topic modeling.
  • parallel for parallel processing.
  • quanteda for text analysis.
  • tidyverse for data manipulation.
  • future for handling parallel computing.

3. Data

Prepare the text data:

  • Load the cleaned text data from semi_supervised_topic_modleing.R.
  • Filter out empty texts and focus on the time period between 1750 and 1825.

4. Create keyATM Docs

Transform data into a format suitable for the keyATM model:

  • Convert the text data into a keyATM-readable format.
  • Save the transformed documents for later use.

5. Keywords

Define and visualize keywords:

  • Load keyword definitions from an external script.
  • Process keywords for analysis, including tokenization and lemmatization.
  • Visualize the frequency of keywords by topic and save the results.

6. Create Decade Time Index Variable

Generate a time index variable:

  • Create a period variable that represents 10-year intervals starting from 1720.

7. Dynamic keyATM

Fit the dynamic keyATM model:

  • Set up parallel processing.
  • Initialize and fit the dynamic keyATM model with specified settings.
  • Save the model and its output for future reference.

8. Model Evaluation

Evaluate the fitted model:

  • Inspect topic-term associations.
  • Diagnose model performance through log-likelihood and perplexity trends.
  • Visualize topic proportions and keyword importance over time.

9. Theta (Document-Topic Distribution)

Analyze the document-topic distribution:

  • Preprocess and format the theta matrix for plotting.
  • Create and save plots showing topic distributions over time and their historical context.

Conclusion

This workflow provides a structured approach to dynamic topic analysis, focusing on how topics evolve over time using the keyATM package. By following these steps, users can gain insights into topic trends, visualize changes, and evaluate the effectiveness of the topic modeling approach.