explaingit

jamwithai/production-agentic-rag-course

5,869PythonAudience · developerComplexity · 4/5Setup · hard

TLDR

A seven-week structured course project for building a production-grade AI research assistant that fetches arXiv papers, indexes them for search, and lets you ask questions using retrieval-augmented generation with agentic self-correction.

Mindmap

mindmap
  root((repo))
    What It Does
      Fetches arXiv papers
      Indexes for search
      AI question answering
    Week-by-Week
      Week 1 Infrastructure
      Week 2 Data pipeline
      Week 3 BM25 search
      Week 4 Hybrid search
      Week 5 Chat interface
      Week 6 Monitoring
      Week 7 Agentic layer
    Tech Stack
      Python
      Docker
      PostgreSQL
      OpenSearch
      LangGraph
      Gradio
    Features
      Tagged weekly releases
      Telegram bot
      Production monitoring
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Follow a 7-week guided course to build a production-quality AI assistant that answers questions from real arXiv research papers

USE CASE 2

Learn to combine BM25 keyword search with semantic vector search for more accurate AI retrieval results

USE CASE 3

Add production monitoring, Redis caching, and agentic self-correction to an AI pipeline you built from scratch

USE CASE 4

Clone any single week's tagged release from GitHub to study just that stage of the build without wading through all accumulated changes

Tech stack

PythonDockerPostgreSQLOpenSearchLangGraphGradioLangfuseRedis

Getting it running

Difficulty · hard Time to first run · 1day+

Requires Docker Desktop, Python 3.12, at least 8GB of RAM, and 20GB of free disk space.

In plain English

This is a seven-week course project that walks you through building a production-grade AI research assistant. The system it teaches you to build is called the arXiv Paper Curator: it automatically fetches academic papers from arXiv (a large free archive of scientific research), stores them, indexes them for search, and then lets you ask questions about them using AI that pulls from actual paper content rather than generating guesses. The course is built around a specific learning philosophy: build the way professional software teams do, rather than jumping straight to AI features. That means mastering keyword search foundations first, then layering in vector-based semantic understanding on top. This approach is why Week 3 covers traditional BM25 keyword search before Week 4 introduces hybrid retrieval combining keyword and semantic signals. The week-by-week progression covers infrastructure setup with Docker, PostgreSQL, and OpenSearch in Week 1, an automated data pipeline for pulling papers from arXiv in Week 2, BM25 keyword search in Week 3, chunking strategies and hybrid search in Week 4, a complete AI pipeline with a chat interface built using Gradio in Week 5, production monitoring with Langfuse and Redis caching in Week 6, and agentic capabilities with LangGraph in Week 7. The agentic layer means the system can grade its own retrieved results, rewrite queries when answers fall short, and detect when a question is outside its scope. A Telegram bot is also added in Week 7 for mobile access. Each week has a companion blog post and a tagged code release on GitHub, so you can clone just one week's version without wading through all accumulated changes. Running the full system requires Docker Desktop, Python 3.12, at least 8GB of RAM, and 20GB of free disk space. Most configuration is handled through a single environment file, and the defaults work without modification for most users.

Copy-paste prompts

Prompt 1
I'm starting Week 1 of this course, give me the Docker Compose setup for PostgreSQL and OpenSearch so I can run the local infrastructure.
Prompt 2
Explain how the BM25 keyword search from Week 3 of this course differs from the hybrid semantic search introduced in Week 4, and when each one is better.
Prompt 3
Walk me through Week 5 of the production-agentic-rag-course: how do I build a Gradio chat interface that retrieves answers from stored arXiv papers?
Prompt 4
Show me how the LangGraph agentic layer in Week 7 allows the system to grade its own retrieval results, rewrite failing queries, and detect out-of-scope questions.
Prompt 5
How do I clone just the Week 4 tagged release from this repository so I only see the code changes up to that point in the course?
Open on GitHub → Explain another repo

← jamwithai on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.