houbb/sensitive-word

★ 5,827JavaAudience · developerComplexity · 2/5Setup · easy

Mindmap

mindmap
  root((sensitive-word))
    What it does
      Detect flagged words
      Replace with asterisks
      Return matched words
    Dictionary
      60000 built-in words
      Profanity terms
      Spam phrases
    Evasion bypass
      Pinyin conversion
      Full-width normalize
      Repeated char skip
    Customization
      Custom word lists
      Whitelist support
      Runtime updates

mindmap root((sensitive-word)) What it does Detect flagged words Replace with asterisks Return matched words Dictionary 60000 built-in words Profanity terms Spam phrases Evasion bypass Pinyin conversion Full-width normalize Repeated char skip Customization Custom word lists Whitelist support Runtime updates

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Add content moderation to a Chinese-language app to automatically block profanity and spam in user submissions.

USE CASE 2

Replace or flag restricted words in real time without restarting the application by updating word lists dynamically.

USE CASE 3

Maintain a custom whitelist so legitimate business terms are never falsely blocked by the filter.

USE CASE 4

Detect email addresses, URLs, and IP addresses in user-submitted text alongside custom word categories.

Tech stack

JavaMavenDFA

Getting it running

Difficulty · easy Time to first run · 5min

Requires JDK 1.8+ and a Maven project, no external infrastructure or API keys needed.

License terms are not described in the explanation.

In plain English

sensitive-word is a Java library for detecting and filtering prohibited or inappropriate text in user-submitted content. You give it a string, and it can tell you whether any flagged words are present, return which ones it found, or replace them with asterisks or a custom substitution. It is written in Chinese and targeted at Chinese-language applications. The library ships with a built-in dictionary of over 60,000 words covering profanity, politically sensitive terms, spam-associated phrases, and other restricted content. Performance is high: the README cites over 140,000 checks per second, achieved through a DFA (Deterministic Finite Automaton) algorithm, which is a pattern-matching technique that processes text efficiently without scanning each word individually from scratch. Beyond exact matches, the tool handles many ways people try to evade filters. It can normalize traditional and simplified Chinese characters to the same form before checking, handle full-width and half-width variants of letters and numbers, convert Chinese characters to their phonetic pinyin spelling, ignore repeated characters (like "heeello"), and skip over special characters inserted between letters. This makes it harder to sneak a flagged word past the filter by disguising it. Developers can add their own custom word lists and whitelists (words to never flag), update those lists dynamically at runtime without restarting the application, assign category tags to individual words, and write custom replacement logic so different words get different substitutions. The library also includes detection modes for email addresses, URLs, and IP addresses. Installation is via a Maven dependency in a Java project (JDK 1.8 or newer required). A companion admin web interface is available as a separate repository for managing the word lists through a UI. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1

I'm integrating sensitive-word into my Java Spring Boot app. Show me how to add a custom word list and whitelist, then replace flagged words with asterisks in user input.

Prompt 2

How does sensitive-word handle evasion tactics like inserting special characters, using full-width letters, or spelling words in pinyin? Give me a code example.

Prompt 3

I need to update the sensitive-word dictionary at runtime without restarting my server. How do I add and remove words dynamically?

Prompt 4

Show me how to assign category tags to words in sensitive-word so I can handle profanity and spam differently in my moderation logic.

Prompt 5

How do I use sensitive-word to detect email addresses and URLs in user messages and return the matched items?

Open on GitHub → Explain another repo

← houbb on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.