llmware-ai/llmware

Analysis updated 2026-06-24 · repo last pushed 2026-05-17

★ 14,860PythonAudience · developerComplexity · 4/5MaintainedSetup · moderate

Mindmap

mindmap
  root((llmware))
    Inputs
      PDF Word Excel
      HTML Images Audio
      User queries
    Outputs
      Embeddings
      Search hits
      Grounded answers
    Use Cases
      Private RAG over docs
      Run small LLMs locally
      Extract facts from PDFs
    Tech Stack
      Python
      GGUF
      ONNX Runtime
      OpenVINO
      PyTorch

mindmap root((llmware)) Inputs PDF Word Excel HTML Images Audio User queries Outputs Embeddings Search hits Grounded answers Use Cases Private RAG over docs Run small LLMs locally Extract facts from PDFs Tech Stack Python GGUF ONNX Runtime OpenVINO PyTorch

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Build a private RAG pipeline over a folder of internal PDFs and Word files running entirely on a laptop.

USE CASE 2

Extract structured facts from contracts using the BLING and DRAGON small fine-tuned models.

USE CASE 3

Swap between local GGUF, ONNX, and OpenVINO inference engines without changing application code.

USE CASE 4

Combine semantic, hybrid, and filtered queries over a Library of parsed documents to ground LLM answers.

What is it built with?

PythonGGUFONNXOpenVINOPyTorch

How does it compare?

	llmware-ai/llmware	trustedsec/social-engineer-toolkit	waditu/tushare
Stars	14,860	14,859	14,878
Language	Python	Python	Python
Last pushed	2026-05-17	—	—
Maintenance	Maintained	—	—
Setup difficulty	moderate	moderate	easy
Complexity	4/5	4/5	2/5
Audience	developer	ops devops	data

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Local inference engines (GGUF, ONNX, OpenVINO) each have their own runtime install, embedding stores need a vector DB configured.

In plain English

llmware is a Python framework for building applications that combine company documents with language models, what the project calls knowledge-based LLM applications. It is targeted at enterprise teams that want to run things locally or in private, including on regular laptops and AI PCs, instead of sending all their data to a cloud API. The README highlights support for Windows, Mac, and Linux, and for several local inference engines including GGUF, OpenVINO, ONNX Runtime, ONNX Runtime with Qualcomm NPU acceleration, Windows Local Foundry, and PyTorch. The project has two main parts. The first is a model catalog with more than 300 models prepackaged in compact, quantized formats, including the project's own small fine-tuned families called BLING, DRAGON, SLIM, and Industry-BERT, which are built for tasks like extracting facts from documents. The catalog also includes hosted cloud models from OpenAI, Anthropic, and Google, all accessed through the same Python interface. The second part is a retrieval-augmented-generation pipeline: tools to parse files like PDF, Word, Excel, PowerPoint, HTML, images, and audio, chunk the text, create a Library, which is the project's name for a knowledge base, install vector embeddings on it, and then run text, semantic, hybrid, and filtered queries. The README shows short Python snippets for each step: ModelCatalog().load_model, Library().add_files, install_new_embedding, Query().semantic_query, and Prompt().prompt_with_source. The pitch in the project's words is that AI should be sustainable, accurate, and cost effective, using the smallest model that does the job. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1

Build an llmware Library that ingests a folder of PDFs, installs embeddings, and answers questions with Prompt.prompt_with_source.

Prompt 2

Compare BLING and DRAGON small models on a fact-extraction task over 50 PDF invoices using llmware.

Prompt 3

Run an llmware semantic_query plus a metadata filter to find clauses in contracts that mention termination after 2023.

Prompt 4

Set up llmware to run a GGUF model on a Windows AI PC with ONNX Runtime and Qualcomm NPU acceleration.

Prompt 5

Migrate an existing OpenAI RAG script to llmware so it uses a local quantized model with the same Python interface.

Frequently asked questions

What is llmware?

llmware is a Python framework for building enterprise RAG pipelines with local small models, document parsing, vector search, and a catalog of 300+ packaged LLMs.

What language is llmware written in?

Mainly Python. The stack also includes Python, GGUF, ONNX.

Is llmware actively maintained?

Maintained — commit in last 6 months (last push 2026-05-17).

How hard is llmware to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is llmware for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.