Find an existing Chinese NLP tool or library instead of building one from scratch.
Discover datasets for Chinese text tasks like sentiment analysis, machine translation, or question answering.
Locate pretrained language models and word lists for Chinese language processing projects.
Research what tools and resources are available for a specific Chinese NLP task.
funNLP is a giant index of Chinese natural-language-processing (NLP) resources collected in one place. It is not a single program but a curated list: each entry points to a tool, a dataset, a model, a paper, or a piece of code that is useful when working with Chinese text. The README is essentially a long catalogue, organised by what each item does. The collection covers the bread and butter of Chinese NLP, including dictionaries and word lists (sensitive words, stopwords, synonyms and antonyms, slang, idioms, place names, historical figures, medical terms, legal terms, surname databases for Chinese and Japanese, traditional-to-simplified conversion), extractors for common pieces of information (phone numbers, ID numbers, email addresses, gender from name), and task-specific tools (Chinese word segmentation, named-entity recognition, sentiment analysis, summarisation, keyword extraction, OCR for handwritten Chinese, speech recognition, text-to-SQL, question answering). It also indexes resources for the deep-learning side of the field: pretrained models such as BERT, ALBERT, ELECTRA and GPT-2 variants for Chinese, knowledge-graph projects in medicine, finance, and law, dialog-system frameworks like Rasa, and benchmark suites and corpora for training and evaluation. Many entries link to Python packages or training code; the language tag is Python because the supporting code samples are written in Python. Someone would use funNLP as a starting point for a Chinese-language project, to find the right library before writing one from scratch, to discover labelled datasets, or to keep up with the field. The full README is longer than what was provided.
Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.