Learn the tidytext pipeline from sentences to word frequencies and sentiment
Reproduce dictionary sentiment analysis with AFINN, Bing, and NRC lexicons
Run local LLMs through Ollama for classification, summarization, and translation in R
Build LDA topic models and bigram network plots on a real dataset
Need to install Ollama, pull the llama3.2 model, and install ten R packages before rendering.
This repository contains the slides and supporting files for a talk given at R-Ladies Rome in May 2026. The talk is titled From Dictionaries to LLMs: Text Analysis in R, and was prepared by Dariia Mykhailyshyna of the Kyiv School of Economics. The README describes a 45-minute walkthrough of a full text-analysis pipeline written in R, and links to a published HTML version of the slide deck. The example dataset for the talk is a public collection of pro-Russian disinformation claims tracked by EUvsDisinfo, downloaded from Kaggle and covering the period from January 2015 to January 2020. The repository keeps a copy of this data as data.csv alongside the source slides and the rendered deck. The first half of the talk covers what the author calls the tidytext pipeline. This means breaking sentences into individual words using the tidytext package, removing common stopwords plus a custom list, and then producing word frequencies, wordclouds, and bar plots. From there the talk moves to dictionary-based sentiment analysis using the AFINN, Bing, and NRC lexicons, looking at how sentiment shifts over time. It then covers topic modeling with LDA, and bigram and word network plots that show which terms tend to appear together. The second half moves to large language models through the R package called mall. The talk shows how to run local LLMs through Ollama, which avoids paying for an API, and walks through helper functions such as llm_sentiment, llm_classify, llm_extract, llm_summarize, llm_verify, llm_translate, and llm_custom. The author also discusses when a simple dictionary lookup is the right tool and when reaching for a language model is worth the extra cost. To reproduce the slides, the README lists the R packages you need to install, including tidyverse, tidytext, stopwords, wordcloud, topicmodels, igraph, ggraph, textdata, mall, and ollamar. You then install Ollama, pull the llama3.2 model, and render the Quarto document with a single quarto render command. The repository also ships a smoketest.R script that runs every R chunk in one pass, which is useful when debugging the pipeline. The README closes with a pointer to Workshops for Ukraine, a charity R workshop series.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.