Analysis updated 2026-05-18
Find an existing Chinese NLP tool or library instead of building one from scratch.
Discover datasets for Chinese text tasks like sentiment analysis, machine translation, or question answering.
Locate pretrained language models and word lists for Chinese language processing projects.
Research what tools and resources are available for a specific Chinese NLP task.
| fighting41love/funnlp | infiniflow/ragflow | karpathy/autoresearch | |
|---|---|---|---|
| Stars | 80,471 | 79,820 | 79,286 |
| Language | Python | Python | Python |
| Setup difficulty | easy | hard | hard |
| Complexity | 1/5 | 4/5 | 3/5 |
| Audience | researcher | developer | researcher |
Figures from each repo's GitHub metadata at analysis time.
funNLP is a large index of Chinese natural-language-processing (NLP) resources collected in one place. It is not a single program but a curated list. Each entry points to a tool, a dataset, a model, a paper, or a piece of code that is useful when working with Chinese text. The README is itself the catalogue, organised by what each item does. The author describes it as the playground for NLP workers, and notes that it is updated irregularly. The top section, which has been growing fastest, is about ChatGPT-style large language models: evaluations and comparisons, background reading, open-source frameworks, training and low-resource fine-tuning, prompt engineering, document question answering, industry applications, course material, safety issues, multi-modal LLMs, and LLM datasets. The wider collection covers the bread and butter of Chinese NLP. There are dictionaries and word lists for sensitive words, stopwords, synonyms and antonyms, idioms, place names, historical figures, medical terms, legal terms, surname databases for Chinese and Japanese, and traditional-to-simplified conversion. There are extractors for common pieces of information such as phone numbers, ID numbers, email addresses, and inferring gender from a name. There are task-specific tools for Chinese word segmentation, named-entity recognition, sentiment analysis, summarisation, keyword extraction, OCR for handwritten Chinese, speech recognition, text-to-SQL, and question answering. It also indexes resources for the deep-learning side of the field: pretrained models such as BERT, ALBERT, ELECTRA and GPT-2 variants for Chinese, knowledge-graph projects in medicine, finance, and law, dialog-system frameworks like Rasa, and benchmark suites and corpora for training and evaluation. Many entries link to Python packages or training code, which is why the repository language tag is Python. Someone would use funNLP as a starting point for a Chinese-language project: to find the right library before writing one from scratch, to discover labelled datasets, or to keep up with the field.
A curated directory of Chinese NLP tools, datasets, models, and code packages organized by task, a reference library for building Chinese language processing systems.
Mainly Python. The stack also includes Python, Chinese NLP.
License could not be detected automatically. Check the repository's LICENSE file before use.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.