explaingit

infiniflow/ragflow

🔥 Hot80,754PythonAudience · developerComplexity · 4/5ActiveLicenseSetup · moderate

TLDR

Open-source RAG engine that grounds LLM answers in your documents with intelligent chunking, citations, and agentic workflows for building production AI assistants.

Mindmap

mindmap
  root((RAGFlow))
    What it does
      Document understanding
      Intelligent chunking
      Grounded citations
      LLM integration
    Key features
      Multi-format support
      Agentic workflows
      Re-ranking search
      Data connectors
    Use cases
      Q&A systems
      Document assistants
      Knowledge bases
      Enterprise search
    Tech stack
      Python backend
      Docker deployment
      LLM APIs
      Embedding models
    Data sources
      Confluence
      S3 storage
      Notion
      Google Drive

Things people build with this

USE CASE 1

Build a question-answering chatbot that answers questions only from your company's internal documents.

USE CASE 2

Create a customer support assistant that retrieves relevant help articles and product docs to answer user questions accurately.

USE CASE 3

Set up an enterprise search system that lets employees find information across Confluence, Google Drive, and S3 with cited sources.

USE CASE 4

Deploy a document-grounded AI agent that can fetch data from multiple sources and reason over it to complete tasks.

Tech stack

PythonDockerLLM APIsEmbedding models

Getting it running

Difficulty · moderate Time to first run · 30min

Requires LLM API keys (OpenAI, etc.) and embedding model setup to see functional RAG results.

Use freely for any purpose, including commercial use, as long as you include the original copyright notice and license text.

In plain English

RAGFlow is an open-source Retrieval-Augmented Generation engine. The idea behind Retrieval-Augmented Generation, or RAG, is that a large language model is given relevant pieces of your own documents at the moment of answering, so its responses are grounded in your data instead of just its training. RAGFlow combines this with what it calls Agent capabilities, giving developers a context layer they can put between their data and an LLM to build production AI systems. The engine is designed around what the README calls "Quality in, quality out". It performs deep document understanding to extract knowledge from unstructured data in many formats, Word documents, slides, Excel files, plain text, images, scanned copies, structured data, web pages, and more. Long documents are split using template-based chunking, described as intelligent and explainable. Answers come with grounded citations: chunked text can be visualized for human intervention, and key references are viewable so users can trace where each answer came from, framed as a way to reduce hallucinations. Around this core, RAGFlow offers an automated RAG workflow with configurable LLMs and embedding models, multiple recall paired with fused re-ranking, and APIs for integration. The README lists support for agentic workflows, MCP, and data synchronization from Confluence, S3, Notion, Discord, and Google Drive. Someone would use RAGFlow to build a question-answering or assistant product on top of their own documents. It is self-hostable via Docker, with stated requirements of at least 4 CPU cores, 16 GB RAM, and 50 GB disk. The project is in Python and licensed under Apache-2.0.

Copy-paste prompts

Prompt 1
How do I set up RAGFlow with Docker to index my company's Word documents and PDFs for a Q&A chatbot?
Prompt 2
Show me how to configure RAGFlow to use OpenAI embeddings and connect it to my Confluence workspace for document syncing.
Prompt 3
How can I use RAGFlow's agentic workflows to build an assistant that retrieves documents from S3 and answers questions with citations?
Prompt 4
What's the best way to chunk long documents in RAGFlow to improve answer quality and reduce hallucinations?
Prompt 5
How do I integrate RAGFlow's API into my Python app to add document-grounded search to my product?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.