Follow a 7-week guided course to build a production-quality AI assistant that answers questions from real arXiv research papers
Learn to combine BM25 keyword search with semantic vector search for more accurate AI retrieval results
Add production monitoring, Redis caching, and agentic self-correction to an AI pipeline you built from scratch
Clone any single week's tagged release from GitHub to study just that stage of the build without wading through all accumulated changes
Requires Docker Desktop, Python 3.12, at least 8GB of RAM, and 20GB of free disk space.
This is a seven-week course project that walks you through building a production-grade AI research assistant. The system it teaches you to build is called the arXiv Paper Curator: it automatically fetches academic papers from arXiv (a large free archive of scientific research), stores them, indexes them for search, and then lets you ask questions about them using AI that pulls from actual paper content rather than generating guesses. The course is built around a specific learning philosophy: build the way professional software teams do, rather than jumping straight to AI features. That means mastering keyword search foundations first, then layering in vector-based semantic understanding on top. This approach is why Week 3 covers traditional BM25 keyword search before Week 4 introduces hybrid retrieval combining keyword and semantic signals. The week-by-week progression covers infrastructure setup with Docker, PostgreSQL, and OpenSearch in Week 1, an automated data pipeline for pulling papers from arXiv in Week 2, BM25 keyword search in Week 3, chunking strategies and hybrid search in Week 4, a complete AI pipeline with a chat interface built using Gradio in Week 5, production monitoring with Langfuse and Redis caching in Week 6, and agentic capabilities with LangGraph in Week 7. The agentic layer means the system can grade its own retrieved results, rewrite queries when answers fall short, and detect when a question is outside its scope. A Telegram bot is also added in Week 7 for mobile access. Each week has a companion blog post and a tagged code release on GitHub, so you can clone just one week's version without wading through all accumulated changes. Running the full system requires Docker Desktop, Python 3.12, at least 8GB of RAM, and 20GB of free disk space. Most configuration is handled through a single environment file, and the defaults work without modification for most users.
← jamwithai on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.