explaingit

lonepatient/awesome-pretrained-chinese-nlp-models

5,557PythonAudience · researcherComplexity · 1/5Setup · easy

TLDR

A curated index of pretrained Chinese NLP models, from large language models to BERT-family variants, with download links, sizes, and metadata organized by type, domain, and architecture.

Mindmap

mindmap
  root((Chinese NLP models))
    Large language models
      General base models
      Domain-specific models
      Chat models
      Multimodal models
    BERT family
      NLU models
      NLG models
      Combined models
    Resources
      Benchmarks
      Datasets
      Embeddings
    Audience
      NLP researchers
      Chinese AI developers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Find and download a pretrained Chinese BERT or MacBERT model for a text classification or NLP task

USE CASE 2

Discover domain-specific Chinese LLMs fine-tuned for finance, medicine, or law

USE CASE 3

Compare Chinese NLP models by size, architecture, and task type in one place without hunting across sites

USE CASE 4

Find Chinese instruction datasets and evaluation benchmarks to test model performance

Tech stack

PythonPyTorchHuggingFace Transformers

Getting it running

Difficulty · easy Time to first run · 5min

In plain English

This repository is a curated index of pretrained Chinese natural language processing models. It collects links, descriptions, and download information for publicly released models that work with Chinese text, organized so researchers and developers can find what they need without having to track down each model individually. The index is split into two broad sections. The first covers large language models, the kind used for chat, reasoning, and general-purpose text generation. These are grouped by type: general-purpose base models with more than 7 billion parameters, domain-specific base models for fields like finance, medicine, and law, general chat models, domain-specific chat models, multimodal models that handle both images and text, and reasoning-focused models for mathematics and logic. Each entry lists the model name, size, release date, supported languages, architecture type, and links to the HuggingFace repository and original project. The second section covers older pretrained models in the BERT family and related architectures. These are organized into NLU models for understanding tasks like classification and question answering, NLG models for generation tasks, combined NLU-NLG models, and multimodal models. The 29 NLU entries include Chinese versions of BERT, RoBERTa, ALBERT, ERNIE, MacBERT, and ELECTRA. The 18 NLG entries include GPT, T5, BART, and CPM variants. The repository also links to evaluation benchmarks for comparing models, open-source model platforms, Chinese instruction datasets, and embedding models. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
I am building a Chinese text classification model. Which Chinese BERT or MacBERT variant from awesome-pretrained-chinese-nlp-models should I fine-tune, and how do I load it with Hugging Face transformers?
Prompt 2
Help me write Python code to load a Chinese RoBERTa model from HuggingFace and fine-tune it on a sentiment analysis dataset of product reviews.
Prompt 3
I need a Chinese LLM under 7B parameters that can run on a laptop. From the awesome-pretrained-chinese-nlp-models index, what are my options and how do I run one locally with the transformers library?
Open on GitHub → Explain another repo

← lonepatient on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.