Learn how to build a sentiment classifier for movie reviews using PyTorch, from a bag-of-words baseline up to BERT.
Fine-tune a pre-trained BERT model for your own text classification task by following the fourth notebook.
Understand the practical difference between LSTM and CNN when applied to text by running the second and third notebooks side by side.
Run all four notebooks directly in a browser without any local setup via Colab.
Requires Python 3.9, install all dependencies with a single pip command or run notebooks directly in Colab.
This repository is a set of four step-by-step tutorials that teach you how to build models that can read text and decide whether it carries a positive or negative tone. The specific task throughout is predicting the sentiment of movie reviews, which is a classic and well-understood problem that makes it easy to see whether a model is working. Each tutorial is a self-contained notebook you can open directly in a browser and run without setting up anything on your own computer. The first tutorial introduces the simplest possible approach: a neural bag-of-words model, which treats a sentence as an unordered collection of words and learns which words tend to signal positive or negative meaning. The second adds a recurrent neural network, a type of model that reads words one at a time and tries to remember context as it goes, using a popular variant called LSTM. The third tutorial switches to a convolutional approach, which scans over small windows of words rather than reading the whole sequence at once. The fourth and final tutorial loads a pre-trained BERT model, a large transformer that was trained on enormous amounts of text before being fine-tuned for this specific task. Each tutorial builds conceptually on the one before it, so reading them in order gives a clear progression from simple to sophisticated. The tutorials are written for Python 3.9 and use PyTorch as the core framework. Dependencies install with a single command. An older set of tutorials exists in a legacy folder for anyone who worked through the previous version and wants to reference it, but the main four notebooks are the current recommended path. This is a learning resource aimed at people who want to understand how text classification works in practice, not just in theory. It does not ship a production tool or a web interface. If you want to understand how modern language models approach simple classification tasks, this is a clear and hands-on starting point.
← bentrevett on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.