rahulgit24/research-paper-rag-system

Analysis updated 2026-05-18

★ 1PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((Research Paper RAG))
    Ingestion
      PDF upload
      Semantic chunking
      Deduplication by hash
    Retrieval pipeline
      Vector search top 30
      BM25 keyword ranking
      Cross-encoder top 5
    Multi-user
      Shared vectors
      Per-user access list
      Safe deletion
    Infrastructure
      Qdrant vector store
      PostgreSQL metadata
      Groq LLM

mindmap root((Research Paper RAG)) Ingestion PDF upload Semantic chunking Deduplication by hash Retrieval pipeline Vector search top 30 BM25 keyword ranking Cross-encoder top 5 Multi-user Shared vectors Per-user access list Safe deletion Infrastructure Qdrant vector store PostgreSQL metadata Groq LLM

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Build a research assistant that lets a team upload papers and ask cross-paper questions in plain English.

USE CASE 2

Create a study tool that answers questions about a set of academic PDFs a student has uploaded.

USE CASE 3

Add a document Q&A feature to a research platform where multiple users share a library of papers.

What is it built with?

PythonFastAPIQdrantPostgreSQLLangChainLlamaIndexGroqsentence-transformers

How does it compare?

	rahulgit24/research-paper-rag-system	a-bissell/unleash-lite	abhiinnovates/whatsapp-hr-assistant
Stars	1	1	1
Language	Python	Python	Python
Setup difficulty	hard	hard	hard
Complexity	4/5	4/5	3/5
Audience	researcher	researcher	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires running Qdrant, PostgreSQL, and a Groq API key all configured before the API works.

In plain English

This is a backend API for asking questions about research papers using AI. You upload PDFs, and the system stores them in a way that allows you to ask natural-language questions and get back answers drawn from the actual text of those papers. It is described as production-ready, meaning it is built with care around real problems that come up when more than one person uses a system like this. The interesting part is the multi-step process used to find the most relevant text before answering. A basic approach would be to convert the question into a number and find paper sections with similar numbers, a technique called vector search. This system adds two more filtering steps on top of that. First it uses keyword matching to complement the vector results, then it uses a more expensive ranking model to compare each candidate passage against the question and pick the best few. The result is that only the most genuinely relevant passages reach the AI, which produces better answers than vector search alone. All of this runs on a regular CPU with no specialized graphics hardware needed. The system also handles multi-user sharing thoughtfully. When two people upload the same PDF, the document is only analyzed and embedded once. Both users get access to the same underlying data, but if one of them later deletes the document, the other still keeps their copy. The deletion only removes that user's access, the shared data is only fully deleted when no one needs it anymore. Follow-up questions in a conversation are handled with a query rewriting step. If you ask "what does it say about the training data?" after a question about a specific paper, the system rewrites your vague follow-up into a self-contained question before searching, so references like "it" or "this" resolve correctly. The API requires PostgreSQL for metadata and document tracking, Qdrant as the vector database, and a Groq API key for the language model that generates answers.

Copy-paste prompts

Prompt 1

Using this Research Paper RAG System, set up the API locally, upload a PDF of a machine learning paper, and write a Python script that asks three follow-up questions about its methodology.

Prompt 2

How does the 3-stage reranker in this system work? Walk me through the pipeline from the initial vector search to the final cross-encoder step.

Prompt 3

Explain the deduplication strategy in this RAG system. How does it handle two users uploading the same PDF, and what happens when one of them deletes it?

Prompt 4

Write a Python script that calls this RAG API to upload a batch of PDF files from a local folder and then runs a list of predefined questions against the collection.

Frequently asked questions

What is research-paper-rag-system?

A Python API that lets you upload research papers and ask questions about them, using a three-step filtering pipeline to find the most relevant passages before generating an answer.

What language is research-paper-rag-system written in?

Mainly Python. The stack also includes Python, FastAPI, Qdrant.

How hard is research-paper-rag-system to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is research-paper-rag-system for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub rahulgit24 on gitmyhub

Verify against the repo before relying on details.