explaingit

hankcs/hanlp

36,307PythonAudience · developerComplexity · 3/5QuietLicenseSetup · moderate

TLDR

A multilingual NLP library that performs ten language analysis tasks (tokenization, tagging, parsing, entity recognition) simultaneously on text in 130 languages, with specialized models for Chinese and Japanese.

Mindmap

mindmap
  root((HanLP))
    What it does
      Tokenization
      Named entity recognition
      Dependency parsing
      Semantic role labeling
    Multilingual support
      130 languages
      Chinese specialist
      Japanese specialist
    How to use
      Python library
      REST API
      Pre-trained models
    Tech stack
      PyTorch
      TensorFlow 2.x
      Transformers
    Use cases
      Search engines
      Information extraction
      Chatbots
      Document classification

Things people build with this

USE CASE 1

Build a search engine that understands what documents are about by extracting named entities and parsing sentence structure.

USE CASE 2

Create an information extraction pipeline that automatically pulls people, organizations, and relationships from unstructured text.

USE CASE 3

Develop a multilingual chatbot that understands user intent by analyzing text in 130 different languages.

USE CASE 4

Classify documents by analyzing their grammatical structure and semantic meaning across Chinese, Japanese, or English text.

Tech stack

PythonPyTorchTensorFlow 2.xTransformers

Getting it running

Difficulty · moderate Time to first run · 30min

Requires downloading large pre-trained models (several GB) on first run; PyTorch/TensorFlow installation can be finicky depending on system.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

HanLP is a multilingual natural language processing (NLP) library designed for both researchers and enterprise applications. NLP is the field of computer science that deals with teaching computers to understand and analyze human language. HanLP bundles together a wide range of text analysis capabilities into a single library, so instead of assembling separate tools for different tasks, you get them all in one package. The library performs ten distinct language analysis tasks simultaneously on a single piece of text: tokenization (breaking text into words or subword units), part-of-speech tagging (labeling each word as a noun, verb, adjective, etc.), lemmatization (finding a word's base form), named entity recognition (finding people, organizations, places), syntactic dependency parsing (mapping grammatical relationships), constituency parsing (drawing a sentence's phrase structure tree), semantic role labeling (identifying who did what to whom), semantic dependency parsing, and abstract meaning representation. Critically, all ten run in a single forward pass through a shared neural network, making it efficient and internally consistent. What makes HanLP particularly valuable is its multilingual coverage, the latest version supports 130 languages through pre-trained models. While many NLP tools focus on English, HanLP was originally built around Chinese and has deep support for Chinese, Japanese, and Korean as well. There are specialized, high-accuracy mono-lingual models for Chinese and Japanese that outperform the general multilingual model. You would use HanLP when building applications that need to parse or extract information from text, search engines, information extraction pipelines, chatbots, document classification systems, or academic NLP research. It can be accessed via a lightweight REST API (sending text to a hosted server) or installed locally via pip for direct Python integration. The tech stack is Python, built on PyTorch and TensorFlow 2.x, using transformer-based pre-trained models. GPU acceleration is recommended but not required.

Copy-paste prompts

Prompt 1
Show me how to use HanLP to tokenize and extract named entities from a Chinese text sample in Python.
Prompt 2
How do I set up HanLP's REST API to analyze text without installing it locally?
Prompt 3
Write a Python script using HanLP that performs part-of-speech tagging and dependency parsing on a sentence in Japanese.
Prompt 4
Compare HanLP's multilingual model versus the specialized Chinese model for accuracy on a sample text.
Prompt 5
How can I integrate HanLP into a document classification pipeline to analyze text in multiple languages?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.