explaingit

vi3k6i5/flashtext

5,710PythonAudience · developerComplexity · 2/5Setup · easy

TLDR

FlashText is a Python library that finds and replaces keywords in text much faster than regular expressions when you have hundreds or thousands of terms to search for at once.

Mindmap

mindmap
  root((FlashText))
    What it does
      Keyword search
      Keyword replace
      Fast at scale
    How it works
      Custom algorithm
      Faster than regex
      Case sensitivity toggle
    Features
      Span positions
      Bulk keyword load
      Dictionary mapping
    Use cases
      Text normalization
      Document tagging
      Data cleaning
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Normalize thousands of documents by replacing multiple company name variants with one canonical name in a single fast pass.

USE CASE 2

Extract all mentions of a large predefined keyword list from a text corpus without slowing down as the list grows.

USE CASE 3

Redact sensitive terms from documents at scale by replacing them with placeholder text.

USE CASE 4

Tag documents by topic by detecting which category keywords appear in each document.

Tech stack

Python

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

FlashText is a Python library for finding and replacing words or phrases in text. You give it a list of keywords to look for, and it either pulls them out of any text you pass in, or swaps them for replacement terms. It is built on a custom algorithm that performs both jobs much faster than regular expressions when the list of keywords is large. The core use case is normalizing text that uses multiple names for the same thing. For example, you might teach it that "Big Apple" and "NYC" both refer to "New York", then run it over thousands of documents to extract or replace those mentions with the standard name. You can load keywords one at a time, from a list, or from a dictionary that maps canonical names to their variants. Keywords can also be removed later, and the processor tracks all of them so you can inspect or count what it knows. By default the library is case-insensitive, but you can switch it to case-sensitive mode. When extracting keywords, you can also ask for span information, which returns the start and end character positions of each match alongside the matched term, useful if you need to know exactly where in the text something appeared. The README includes benchmark charts comparing FlashText to Python's built-in regular expression module. The advantage grows as the keyword list grows: with hundreds or thousands of terms, FlashText stays roughly constant in speed while regex slows down proportionally. This makes it practical for tasks like redacting sensitive terms, tagging documents by topic, or cleaning inconsistent terminology across a large dataset. Installation is a single pip command. The library has no unusual dependencies and the README includes short examples for every feature it describes.

Copy-paste prompts

Prompt 1
Using FlashText in Python, load a dictionary mapping 'Big Apple' and 'NYC' to 'New York', then extract all keyword matches from a list of article strings.
Prompt 2
I have a list of 1000 product name variants. Show me how to use FlashText to replace all variants with their canonical names across a pandas DataFrame column.
Prompt 3
Using FlashText in case-sensitive mode, extract all keyword matches along with their character start and end positions from a text string.
Prompt 4
How do I add keywords to a FlashText KeywordProcessor one at a time, inspect what it knows, remove one keyword, then run a replacement pass?
Open on GitHub → Explain another repo

← vi3k6i5 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.