Add content moderation to a Chinese-language app to automatically block profanity and spam in user submissions.
Replace or flag restricted words in real time without restarting the application by updating word lists dynamically.
Maintain a custom whitelist so legitimate business terms are never falsely blocked by the filter.
Detect email addresses, URLs, and IP addresses in user-submitted text alongside custom word categories.
Requires JDK 1.8+ and a Maven project, no external infrastructure or API keys needed.
sensitive-word is a Java library for detecting and filtering prohibited or inappropriate text in user-submitted content. You give it a string, and it can tell you whether any flagged words are present, return which ones it found, or replace them with asterisks or a custom substitution. It is written in Chinese and targeted at Chinese-language applications. The library ships with a built-in dictionary of over 60,000 words covering profanity, politically sensitive terms, spam-associated phrases, and other restricted content. Performance is high: the README cites over 140,000 checks per second, achieved through a DFA (Deterministic Finite Automaton) algorithm, which is a pattern-matching technique that processes text efficiently without scanning each word individually from scratch. Beyond exact matches, the tool handles many ways people try to evade filters. It can normalize traditional and simplified Chinese characters to the same form before checking, handle full-width and half-width variants of letters and numbers, convert Chinese characters to their phonetic pinyin spelling, ignore repeated characters (like "heeello"), and skip over special characters inserted between letters. This makes it harder to sneak a flagged word past the filter by disguising it. Developers can add their own custom word lists and whitelists (words to never flag), update those lists dynamically at runtime without restarting the application, assign category tags to individual words, and write custom replacement logic so different words get different substitutions. The library also includes detection modes for email addresses, URLs, and IP addresses. Installation is via a Maven dependency in a Java project (JDK 1.8 or newer required). A companion admin web interface is available as a separate repository for managing the word lists through a UI. The full README is longer than what was shown.
← houbb on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.