explaingit

dariia-m/r-ladies-rome-text-analysis

2JavaScriptAudience · dataComplexity · 2/5ActiveSetup · moderate

TLDR

Slides and code for an R-Ladies Rome talk on text analysis in R, covering the tidytext pipeline, dictionary sentiment, topic modeling, and local LLMs via the mall package.

Mindmap

mindmap
  root((r-ladies-rome-text-analysis))
    Inputs
      EUvsDisinfo claims CSV
      Quarto slides
    Outputs
      Rendered HTML deck
      Wordclouds and plots
    Use Cases
      Learn tidytext
      Run local LLMs in R
      Topic modeling demo
    Tech Stack
      R
      tidytext
      mall
      Ollama
      Quarto

Things people build with this

USE CASE 1

Learn the tidytext pipeline from sentences to word frequencies and sentiment

USE CASE 2

Reproduce dictionary sentiment analysis with AFINN, Bing, and NRC lexicons

USE CASE 3

Run local LLMs through Ollama for classification, summarization, and translation in R

USE CASE 4

Build LDA topic models and bigram network plots on a real dataset

Tech stack

RtidytextmallOllamaQuartoLDA

Getting it running

Difficulty · moderate Time to first run · 30min

Need to install Ollama, pull the llama3.2 model, and install ten R packages before rendering.

In plain English

This repository contains the slides and supporting files for a talk given at R-Ladies Rome in May 2026. The talk is titled From Dictionaries to LLMs: Text Analysis in R, and was prepared by Dariia Mykhailyshyna of the Kyiv School of Economics. The README describes a 45-minute walkthrough of a full text-analysis pipeline written in R, and links to a published HTML version of the slide deck. The example dataset for the talk is a public collection of pro-Russian disinformation claims tracked by EUvsDisinfo, downloaded from Kaggle and covering the period from January 2015 to January 2020. The repository keeps a copy of this data as data.csv alongside the source slides and the rendered deck. The first half of the talk covers what the author calls the tidytext pipeline. This means breaking sentences into individual words using the tidytext package, removing common stopwords plus a custom list, and then producing word frequencies, wordclouds, and bar plots. From there the talk moves to dictionary-based sentiment analysis using the AFINN, Bing, and NRC lexicons, looking at how sentiment shifts over time. It then covers topic modeling with LDA, and bigram and word network plots that show which terms tend to appear together. The second half moves to large language models through the R package called mall. The talk shows how to run local LLMs through Ollama, which avoids paying for an API, and walks through helper functions such as llm_sentiment, llm_classify, llm_extract, llm_summarize, llm_verify, llm_translate, and llm_custom. The author also discusses when a simple dictionary lookup is the right tool and when reaching for a language model is worth the extra cost. To reproduce the slides, the README lists the R packages you need to install, including tidyverse, tidytext, stopwords, wordcloud, topicmodels, igraph, ggraph, textdata, mall, and ollamar. You then install Ollama, pull the llama3.2 model, and render the Quarto document with a single quarto render command. The repository also ships a smoketest.R script that runs every R chunk in one pass, which is useful when debugging the pipeline. The README closes with a pointer to Workshops for Ukraine, a charity R workshop series.

Copy-paste prompts

Prompt 1
Set up Ollama and the mall package locally and rerun the llm_sentiment example from this repo on my own CSV
Prompt 2
Adapt the tidytext stopwords and wordcloud code to a different news dataset
Prompt 3
Help me debug the smoketest.R script when one of the LDA chunks fails
Prompt 4
Walk me through when to use AFINN versus an Ollama llama3.2 model for sentiment
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.