Train a model to classify movie reviews as positive or negative using the included example dataset.
Adapt the training script to sort support tickets by topic using your own labeled text data.
Use the codebase as a readable reference for understanding how convolutional neural networks process text.
Run the evaluation script to measure how accurately your trained model performs on held-out examples.
Targets an older TensorFlow version from 2015, requires matching the correct TensorFlow and Python versions before training will run.
This repository is a Python implementation of a specific machine learning technique for sorting text into categories. The technique, called a convolutional neural network, was described in a 2014 academic paper by Yoon Kim, and this code is a simplified version of that paper's approach, built using TensorFlow. It was created as companion code for a blog post on the site WildML. The idea behind text classification is to train a program to read a piece of text and decide what category it belongs to. For example, you could teach it to label movie reviews as positive or negative, or sort support tickets by topic. The model learns by reading many examples, finding patterns in the words, and adjusting itself until it gets good at making those judgments. To use this code, you run a training script that feeds it data and produces a saved model. There are several settings you can adjust before training, like how many passes to make over the data, how large to make the batches of examples processed at once, and how aggressively to apply regularization (a technique that helps the model avoid memorizing the training data too closely). Once training finishes, a separate evaluation script lets you test how well the resulting model performs. The repository is fairly minimal in scope. It does not include a graphical interface or a hosted demo. Using it requires some comfort with running Python scripts from a terminal, managing dependencies like TensorFlow and Numpy, and preparing your own text dataset if you want to classify something beyond the default example data. This code dates from 2015 and targets an older version of TensorFlow. Researchers and students studying the foundations of text classification models still find it useful as a clean, readable reference for how this type of neural network is structured.
← dennybritz on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.