explaingit

fighting41love/funnlp

80,471PythonAudience · researcherComplexity · 1/5DormantSetup · easy

TLDR

A curated directory of Chinese NLP tools, datasets, models, and code packages organized by task, a reference library for building Chinese language processing systems.

Mindmap

mindmap
  root((repo))
    What it does
      Curated link index
      Chinese NLP focus
      Task-organized
    Content areas
      LLMs and prompting
      Traditional NLP
      Specialized datasets
      Word lists
    Use cases
      Find existing tools
      Discover datasets
      Research reference
      Project setup
    Audience
      NLP practitioners
      Researchers
      Chinese tech teams

Things people build with this

USE CASE 1

Find an existing Chinese NLP tool or library instead of building one from scratch.

USE CASE 2

Discover datasets for Chinese text tasks like sentiment analysis, machine translation, or question answering.

USE CASE 3

Locate pretrained language models and word lists for Chinese language processing projects.

USE CASE 4

Research what tools and resources are available for a specific Chinese NLP task.

Tech stack

PythonChinese NLP

Getting it running

Difficulty · easy Time to first run · 5min
License could not be detected automatically. Check the repository's LICENSE file before use.

In plain English

funNLP is a giant index of Chinese natural-language-processing (NLP) resources collected in one place. It is not a single program but a curated list: each entry points to a tool, a dataset, a model, a paper, or a piece of code that is useful when working with Chinese text. The README is essentially a long catalogue, organised by what each item does. The collection covers the bread and butter of Chinese NLP, including dictionaries and word lists (sensitive words, stopwords, synonyms and antonyms, slang, idioms, place names, historical figures, medical terms, legal terms, surname databases for Chinese and Japanese, traditional-to-simplified conversion), extractors for common pieces of information (phone numbers, ID numbers, email addresses, gender from name), and task-specific tools (Chinese word segmentation, named-entity recognition, sentiment analysis, summarisation, keyword extraction, OCR for handwritten Chinese, speech recognition, text-to-SQL, question answering). It also indexes resources for the deep-learning side of the field: pretrained models such as BERT, ALBERT, ELECTRA and GPT-2 variants for Chinese, knowledge-graph projects in medicine, finance, and law, dialog-system frameworks like Rasa, and benchmark suites and corpora for training and evaluation. Many entries link to Python packages or training code; the language tag is Python because the supporting code samples are written in Python. Someone would use funNLP as a starting point for a Chinese-language project, to find the right library before writing one from scratch, to discover labelled datasets, or to keep up with the field. The full README is longer than what was provided.

Copy-paste prompts

Prompt 1
I need to build a Chinese sentiment analysis system. What tools and datasets does funNLP recommend?
Prompt 2
Show me the Chinese NLP resources in funNLP for information extraction and named entity recognition.
Prompt 3
What pretrained language models for Chinese does funNLP list, and where can I find them?
Prompt 4
I'm working on Chinese machine translation. What datasets and tools does funNLP have for this?
Prompt 5
Help me navigate funNLP to find Chinese word lists, dictionaries, and specialized vocabularies for my domain.
Open on GitHub → Explain another repo

Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.