explaingit

hankcs/hanlp

Analysis updated 2026-05-18

36,296PythonAudience · developerComplexity · 3/5LicenseSetup · moderate

TLDR

A multilingual NLP library that performs ten language analysis tasks (tokenization, tagging, parsing, entity recognition) simultaneously on text in 130 languages, with specialized models for Chinese and Japanese.

Mindmap

mindmap
  root((HanLP))
    What it does
      Tokenization
      Named entity recognition
      Dependency parsing
      Semantic role labeling
    Multilingual support
      130 languages
      Chinese specialist
      Japanese specialist
    How to use
      Python library
      REST API
      Pre-trained models
    Tech stack
      PyTorch
      TensorFlow 2.x
      Transformers
    Use cases
      Search engines
      Information extraction
      Chatbots
      Document classification
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Build a search engine that understands what documents are about by extracting named entities and parsing sentence structure.

USE CASE 2

Create an information extraction pipeline that automatically pulls people, organizations, and relationships from unstructured text.

USE CASE 3

Develop a multilingual chatbot that understands user intent by analyzing text in 130 different languages.

USE CASE 4

Classify documents by analyzing their grammatical structure and semantic meaning across Chinese, Japanese, or English text.

What is it built with?

PythonPyTorchTensorFlow 2.xTransformers

How does it compare?

hankcs/hanlpgoogle/langextractmyshell-ai/openvoice
Stars36,29636,39036,463
LanguagePythonPythonPython
Setup difficultymoderatemoderatehard
Complexity3/52/54/5
Audiencedeveloperdeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires downloading large pre-trained models (several GB) on first run, PyTorch/TensorFlow installation can be finicky depending on system.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

HanLP is a multilingual natural language processing (NLP) library designed for both researchers and enterprise applications. NLP is the field of computer science that deals with teaching computers to understand and analyze human language. HanLP bundles together a wide range of text analysis capabilities into a single library, so instead of assembling separate tools for different tasks, you get them all in one package. The library performs ten distinct language analysis tasks simultaneously on a single piece of text: tokenization (breaking text into words or subword units), part-of-speech tagging (labeling each word as a noun, verb, adjective, etc.), lemmatization (finding a word's base form), named entity recognition (finding people, organizations, places), syntactic dependency parsing (mapping grammatical relationships), constituency parsing (drawing a sentence's phrase structure tree), semantic role labeling (identifying who did what to whom), semantic dependency parsing, and abstract meaning representation. Critically, all ten run in a single forward pass through a shared neural network, making it efficient and internally consistent. What makes HanLP particularly valuable is its multilingual coverage, the latest version supports 130 languages through pre-trained models. While many NLP tools focus on English, HanLP was originally built around Chinese and has deep support for Chinese, Japanese, and Korean as well. There are specialized, high-accuracy mono-lingual models for Chinese and Japanese that outperform the general multilingual model. You would use HanLP when building applications that need to parse or extract information from text, search engines, information extraction pipelines, chatbots, document classification systems, or academic NLP research. It can be accessed via a lightweight REST API (sending text to a hosted server) or installed locally via pip for direct Python integration. The tech stack is Python, built on PyTorch and TensorFlow 2.x, using transformer-based pre-trained models. GPU acceleration is recommended but not required.

Copy-paste prompts

Prompt 1
Show me how to use HanLP to tokenize and extract named entities from a Chinese text sample in Python.
Prompt 2
How do I set up HanLP's REST API to analyze text without installing it locally?
Prompt 3
Write a Python script using HanLP that performs part-of-speech tagging and dependency parsing on a sentence in Japanese.
Prompt 4
Compare HanLP's multilingual model versus the specialized Chinese model for accuracy on a sample text.
Prompt 5
How can I integrate HanLP into a document classification pipeline to analyze text in multiple languages?

Frequently asked questions

What is hanlp?

A multilingual NLP library that performs ten language analysis tasks (tokenization, tagging, parsing, entity recognition) simultaneously on text in 130 languages, with specialized models for Chinese and Japanese.

What language is hanlp written in?

Mainly Python. The stack also includes Python, PyTorch, TensorFlow 2.x.

What license does hanlp use?

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

How hard is hanlp to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is hanlp for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub hankcs on gitmyhub

Verify against the repo before relying on details.