explaingit

infinilabs/analysis-ik

Analysis updated 2026-06-24

17,448JavaAudience · developerComplexity · 3/5LicenseSetup · moderate

TLDR

Elasticsearch and OpenSearch plugin that adds Chinese-language tokenization via the IK analyzer, with custom and hot-reloadable dictionaries.

Mindmap

mindmap
  root((analysis-ik))
    Inputs
      Chinese text
      Custom dictionary files
      Remote dictionary URL
    Outputs
      Tokenized text
      ik_max_word mode
      ik_smart mode
    Use Cases
      Search Chinese content in Elasticsearch
      Build a Chinese full text search app
      Extend tokenizer with domain vocabulary
    Tech Stack
      Java
      Elasticsearch
      OpenSearch
      Apache 2.0
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Add Chinese search support to an existing Elasticsearch deployment

USE CASE 2

Build a Chinese-language product search backed by OpenSearch

USE CASE 3

Extend the tokenizer with a domain-specific vocabulary file

USE CASE 4

Hot-reload an evolving keyword list without restarting the cluster

What is it built with?

JavaElasticsearchOpenSearch

How does it compare?

infinilabs/analysis-ikjustauth/justauthopenzipkin/zipkin
Stars17,44817,44417,431
LanguageJavaJavaJava
Setup difficultymoderateeasyeasy
Complexity3/52/53/5
Audiencedeveloperdeveloperops devops

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Plugin version must match the exact Elasticsearch or OpenSearch version you are running.

Apache 2.0 lets you use, modify, and redistribute the code commercially, as long as you keep the license notice and state your changes.

In plain English

analysis-ik is a plugin that adds Chinese-language text analysis to Elasticsearch and OpenSearch, two popular search engines used to build fast, scalable search features in applications. The core problem it solves is that Chinese text has no spaces between words, making it hard for search engines to know where one word ends and another begins. This plugin integrates the IK analyzer (a Chinese text tokenizer, meaning a tool that splits text into meaningful units called tokens) so that search queries and indexed content are broken down correctly. The plugin provides two tokenizer modes. "ik_max_word" performs the finest-grained split, generating all possible word combinations from a phrase, useful when you want to match any possible way a query might overlap with the content. "ik_smart" performs a coarser, more minimal split, useful for phrase-level queries. You configure which mode to use per field in your Elasticsearch index mapping. Custom dictionaries are supported: you can add your own vocabulary files (lists of words, one per line, in UTF-8 encoding) to extend the default dictionary. The plugin also supports hot-reloading dictionaries from a remote URL, meaning you can update the word list without restarting the search engine, as long as the server serving the file returns standard HTTP caching headers. Installation is done via the Elasticsearch or OpenSearch plugin CLI with a single command. The plugin is written in Java, licensed under Apache 2.0, and maintained by INFINI Labs.

Copy-paste prompts

Prompt 1
Show me the Elasticsearch index mapping needed to use ik_max_word for indexing and ik_smart for search on a text field
Prompt 2
Walk me through installing analysis-ik on Elasticsearch 8.x with the plugin CLI
Prompt 3
Give me a sample custom dictionary file for tech product names and the config block that loads it
Prompt 4
Set up a remote dictionary endpoint with the right HTTP caching headers so analysis-ik hot reloads my word list
Prompt 5
Compare ik_max_word and ik_smart with a concrete example phrase and the tokens each emits

Frequently asked questions

What is analysis-ik?

Elasticsearch and OpenSearch plugin that adds Chinese-language tokenization via the IK analyzer, with custom and hot-reloadable dictionaries.

What language is analysis-ik written in?

Mainly Java. The stack also includes Java, Elasticsearch, OpenSearch.

What license does analysis-ik use?

Apache 2.0 lets you use, modify, and redistribute the code commercially, as long as you keep the license notice and state your changes.

How hard is analysis-ik to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is analysis-ik for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub infinilabs on gitmyhub

Verify against the repo before relying on details.