explaingit

huichen/wukong

4,489GoAudience · developerComplexity · 3/5LicenseSetup · moderate

TLDR

Wukong is a fast, embeddable full-text search engine library for Go apps, with built-in Chinese word segmentation, BM25 relevance ranking, and the ability to index one million documents in under 30 seconds.

Mindmap

mindmap
  root((wukong))
    What it does
      Full-text search
      Embeddable in Go apps
      Chinese segmentation
    Performance
      1M docs in 28 seconds
      1.65ms avg query time
      19k queries per second
    Ranking Features
      BM25 relevance
      Proximity scoring
      Custom scoring rules
    Operations
      Live add and remove docs
      Disk persistence
      Distributed mode
    Use Cases
      Microblog search
      Go API backend
      Chinese content search
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Add keyword search to a Go web app so users can find posts or documents with ranked results instantly

USE CASE 2

Build a Chinese-language search feature for a microblog or news site with relevance and proximity scoring

USE CASE 3

Embed a high-throughput search index (19,000 queries/second) inside a Go service without running a separate search server

USE CASE 4

Save and reload the search index to disk so the engine survives restarts without re-indexing everything

Tech stack

GoBM25sego

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a Chinese dictionary file for word segmentation, English documentation is sparse so expect to read the linked Chinese tutorial.

Use freely for any purpose, including commercial use, as long as you keep the copyright notice (Apache 2.0).

In plain English

Wukong is a full-text search engine library written in Go. A full-text search engine lets your application search across a collection of text documents by keyword, returning the most relevant results quickly. Wukong is designed to be embedded in your own Go application rather than run as a standalone service. The README is written in Chinese and describes an engine built with Chinese-language content in mind, though the underlying technology applies to other languages too. Key numbers cited: indexing one million short posts totaling 500MB takes about 28 seconds, search responses average 1.65 milliseconds, and the engine handles around 19,000 search queries per second. Chinese word segmentation is built in using a companion library called sego, processing text at 27 megabytes per second. Beyond basic keyword matching, the engine supports proximity scoring (rewarding results where searched terms appear close together in the original text), BM25 relevance scoring (a standard formula used in information retrieval to rank how well a document matches a query), and custom scoring rules so developers can define their own ranking logic. Documents can be added and removed from the index while the engine is running, without restarting. The index can also be saved to disk and reloaded, and a distributed mode is mentioned for spreading work across multiple machines. The code example in the README shows the minimal setup: initialize the engine with a dictionary file, add a few documents, flush the index, and run a search. The result is a ranked list of matching documents. The project is released under the Apache License v2, which permits commercial use. The README is sparse in English documentation but links to a tutorial that walks through building a microblog search site in under 200 lines of Go code.

Copy-paste prompts

Prompt 1
Using the wukong Go library, show me how to initialize the engine with a dictionary file, index 1,000 documents, and run a keyword search that returns ranked results.
Prompt 2
How do I implement custom scoring rules in wukong to boost results based on recency alongside BM25 relevance?
Prompt 3
Write a minimal Go HTTP API that wraps the wukong search engine so users can add documents and search via REST endpoints.
Prompt 4
Show me how to save the wukong index to disk and reload it on startup so I don't need to re-index on every server restart.
Prompt 5
How do I configure wukong's distributed mode to spread indexing and search across multiple machines?
Open on GitHub → Explain another repo

← huichen on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.