explaingit

promtengineer/localgpt

22,201PythonAudience · developerComplexity · 4/5MaintainedLicenseSetup · hard

TLDR

A private document Q&A system that runs on your computer. Upload PDFs or text files, ask questions in plain English, and get answers sourced from your documents, no data leaves your machine.

Mindmap

mindmap
  root((LocalGPT))
    What it does
      Document Q&A
      Private, local-only
      Source attribution
    How it works
      RAG retrieval
      Semantic search
      Local AI models
    Interfaces
      Web browser UI
      REST API
      Chat history
    Tech stack
      Python, Node.js
      Ollama models
      GPU acceleration
    Use cases
      Research analysis
      Contract review
      Knowledge bases

Things people build with this

USE CASE 1

Upload research papers and ask questions to extract key findings without manually reading everything.

USE CASE 2

Review contracts or legal documents by asking specific questions and getting answers with exact source locations.

USE CASE 3

Build a searchable knowledge base from internal company documents that stays completely private.

USE CASE 4

Analyze PDFs and reports locally without uploading sensitive data to cloud services.

Tech stack

PythonNode.jsOllamaCUDADocker

Getting it running

Difficulty · hard Time to first run · 1day+

Requires Ollama setup with CUDA/GPU support, Docker, and multiple service coordination (backend, frontend, LLM inference).

Use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

LocalGPT is a private, on-premise platform for chatting with your own documents, the kind of tool where you upload PDFs and other files and then ask questions about them in plain English, except that none of the data is ever sent to an outside server. The README pitches it as a fully private Document Intelligence platform where you can ask questions, summarise material, and surface insights from your files using modern AI, without anything leaving your machine. The way it works is a more elaborate take on what is usually called RAG (retrieval-augmented generation), the standard technique of finding relevant snippets of your documents and feeding them to a language model. LocalGPT layers extra components on top of that. A hybrid search engine blends semantic similarity, keyword matching, and a technique called Late Chunking that is aimed at long-context precision. A smart router decides per query whether to use RAG or let the language model answer directly. Contextual enrichment and sentence-level Context Pruning trim retrieved material down to the most relevant pieces, and an independent verification pass re-checks the final answer for accuracy. The README also mentions query decomposition (breaking complex questions into sub-questions), a TTL-based semantic cache that matches by similarity, session-aware chat history, and source attribution so every answer references the documents it came from. You would reach for this if you have sensitive documents, internal company files, legal material, personal notes, and want a chat interface that behaves like commercial assistants but runs entirely on your own hardware. The README highlights utmost privacy, reusing language models you've already downloaded, an API for building your own RAG applications, and execution support for CUDA GPUs, plain CPU, Intel Gaudi (HPU), and Apple MPS. The architecture is described as modular and lightweight, with a pure-Python RAG core. The project is written in Python (3.8 or higher, tested on 3.11.5) with a Node.js web interface, uses Ollama for language-model inference and Hugging Face for embeddings and reranking, and lists dependencies including torch, transformers, lancedb as the vector database, rank_bm25 and fuzzywuzzy for search, sentence_transformers and rerankers for embeddings, and docling for document processing. The README provides both a Docker deployment path and a direct development setup, and notes that installation has currently only been tested on macOS. The full README is longer than what was provided.

Copy-paste prompts

Prompt 1
How do I set up LocalGPT on my Mac with Apple Silicon to chat with my PDF files privately?
Prompt 2
Show me how to use the LocalGPT REST API to integrate document Q&A into my Python application.
Prompt 3
What's the difference between LocalGPT's semantic search and keyword matching, and when would I use each?
Prompt 4
How do I deploy LocalGPT using Docker so it runs on my home server with GPU acceleration?
Prompt 5
Can you explain how LocalGPT's router decides whether to search documents or answer directly from the model?
Open on GitHub → Explain another repo

Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.