Ask questions about a single PDF inside a Jupyter notebook
Walk through a complete RAG pipeline end to end
Compare different chunk sizes and overlaps for retrieval quality
Run the example against the Attention Is All You Need paper
Mostly pip installs in the notebook, but you do need a Google Gemini API key from AI Studio before the answer step works.
DocuChat RAG is a chatbot project that lets you ask questions about the contents of a PDF and get answers grounded in that document. It is built as a Jupyter notebook and demonstrates a full retrieval-augmented generation pipeline, which means the program first finds the most relevant pieces of the PDF and then asks a language model to write an answer using only those pieces. The notebook stitches together several open libraries. LangChain handles the plumbing, PyPDFLoader reads the PDF, a HuggingFace model called all-mpnet-base-v2 turns text into numerical vectors, and ChromaDB stores those vectors so the system can look up similar passages quickly. Google's Gemini 2.5 Flash is the language model that writes the final answer. The author notes that this is meant to show a complete RAG pipeline for document question answering, not to be a polished product. To use it you need Python 3.8 or higher, a Jupyter or Google Colab environment, a PDF to query, and a Google Gemini API key from Google AI Studio. The notebook contains pip install cells for each dependency and reads the API key from Colab secrets or an environment variable. The example in the README uses the Attention Is All You Need paper as the source PDF. The main function is docu_chat(user_query), which returns a dictionary with the answer, the chunks that were retrieved, and the combined context that was sent to the model. Settings such as chunk size, chunk overlap, the embedding model, and the number of retrieved chunks can be changed in the notebook. The project is MIT licensed and has zero stars at the time of writing.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.