explaingit

cinnamon/kotaemon

25,383PythonAudience · developerComplexity · 3/5MaintainedLicenseSetup · moderate

TLDR

Self-hosted chat interface that lets you ask questions about your own documents using AI, with answers backed by citations from the source material.

Mindmap

mindmap
  root((kotaemon))
    What it does
      Chat with documents
      AI-powered search
      Citation tracking
    How it works
      Hybrid retrieval
      Keyword search
      Vector search
      PDF viewer
    Features
      Multi-user login
      Document sharing
      Image support
      Table extraction
    Tech stack
      Python
      Gradio
      Docker
    AI providers
      OpenAI
      Azure
      Groq
      Ollama local
    Use cases
      Research analysis
      Legal review
      Report extraction

Things people build with this

USE CASE 1

Upload a collection of research papers and ask questions to extract key findings without reading each one manually.

USE CASE 2

Build a legal document review system where lawyers can query contracts and regulations with cited answers.

USE CASE 3

Create an internal knowledge base where team members ask questions about company reports and get instant answers with source references.

USE CASE 4

Extract information from tables and images in PDFs by asking natural language questions instead of manual data entry.

Tech stack

PythonGradioDockerRAGVector search

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Docker setup and vector database initialization; API key for LLM service likely needed.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

kotaemon is an open-source, self-hosted chat interface that lets you have conversations with your own documents using AI. It solves the problem of needing to search through large collections of PDFs, reports, or other files manually, instead, you upload your documents and ask questions in plain language, and the AI finds relevant passages and answers you. The technology behind it is called RAG, which stands for Retrieval-Augmented Generation. This means the AI doesn't just rely on its training knowledge; it first searches your uploaded documents to find relevant sections, then uses that retrieved content to generate an accurate, cited answer. kotaemon uses a hybrid retrieval approach, combining traditional keyword search with semantic (meaning-based) vector search, to improve the quality of what it finds. Answers come with citations, and you can see exactly which passages were used, highlighted directly in a built-in PDF viewer. The tool supports multiple AI providers, including OpenAI, Azure, Groq, and locally-run models via Ollama, and handles images, tables, and complex multi-step questions. It has a multi-user login system, supports private and shared document collections, and is built on Gradio (a Python framework for building web UIs). You can run it with Docker for the easiest setup. You would use kotaemon if you are a researcher, lawyer, analyst, or any knowledge worker who needs to quickly extract information from large document collections. The tech stack is Python.

Copy-paste prompts

Prompt 1
How do I set up kotaemon with Docker to chat with my own PDF documents using OpenAI?
Prompt 2
Show me how to configure kotaemon to use Ollama for local AI models instead of cloud providers.
Prompt 3
How can I enable multi-user access and document sharing in kotaemo so my team can collaborate on document analysis?
Prompt 4
What's the difference between keyword search and vector search in kotaemo's hybrid retrieval, and how do I tune it for better results?
Prompt 5
How do I extract tables and images from PDFs using kotaemo's chat interface?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.