datawhalechina/all-in-rag

★ 7,378PythonAudience · developerComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((repo))
    What it does
      RAG tutorial series
      Ten chapters
    Pipeline steps
      Data loading
      Text chunking
      Vector storage
      Hybrid search
    Advanced topics
      Text2SQL
      Knowledge graphs
      System evaluation
    Tech stack
      Python
      Docker
    Audience
      Python developers
      AI learners

mindmap root((repo)) What it does RAG tutorial series Ten chapters Pipeline steps Data loading Text chunking Vector storage Hybrid search Advanced topics Text2SQL Knowledge graphs System evaluation Tech stack Python Docker Audience Python developers AI learners

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build a question-answering chatbot that searches through your own PDF or text documents using AI retrieval.

USE CASE 2

Add Text2SQL so users can ask plain-English questions and get answers from a relational database.

USE CASE 3

Evaluate how accurately your RAG pipeline answers questions using the metrics covered in the tutorial.

Tech stack

PythonDocker

Getting it running

Difficulty · moderate Time to first run · 1h+

Requires Docker and basic Linux command familiarity, course content is primarily written in Chinese.

No license information was mentioned in the explanation.

In plain English

All-in-RAG is a structured Chinese-language tutorial series from Datawhale that teaches developers how to build RAG applications. RAG stands for Retrieval-Augmented Generation, a technique where an AI system looks up relevant information from a knowledge base before generating an answer. This approach lets you build question-answering systems that draw on your own documents rather than relying solely on what a language model learned during training. The tutorial is organized into ten chapters covering the full pipeline from start to finish. Early chapters explain the core concepts and walk through a minimal working example in four steps. Later chapters cover data loading and preparation, splitting documents into chunks, turning text into vector representations that can be searched by meaning, storing those vectors in a database, and combining different search strategies to improve result quality. The series also covers converting natural language questions into database queries (Text2SQL), evaluating how well a RAG system performs, and connecting retrieved results to a language model to produce formatted answers. Toward the end there are two complete hands-on projects that apply all of these pieces together, including an optional extension that uses a knowledge graph to improve retrieval. An extra chapter section allows community members to contribute specialized topics. The intended audience is Python developers with basic programming skills who want to understand and build production-grade RAG systems. Basic familiarity with Docker and Linux commands is listed as a prerequisite. The course is written primarily in Chinese with an English README available, and the full content can be read online through the project documentation site.

Copy-paste prompts

Prompt 1

I'm working through the all-in-rag tutorial. Show me how to load my own PDF files, split them into chunks, embed them, and store them in a vector database ready for search.

Prompt 2

Using the all-in-rag pipeline, how do I set up Text2SQL so users can type questions in plain English and get results from a SQL database?

Prompt 3

How do I combine keyword search and semantic vector search in the all-in-rag pipeline to improve the quality of retrieved chunks?

Open on GitHub → Explain another repo

← datawhalechina on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.