Analysis updated 2026-06-24 · repo last pushed 2026-05-17
Build a private RAG pipeline over a folder of internal PDFs and Word files running entirely on a laptop.
Extract structured facts from contracts using the BLING and DRAGON small fine-tuned models.
Swap between local GGUF, ONNX, and OpenVINO inference engines without changing application code.
Combine semantic, hybrid, and filtered queries over a Library of parsed documents to ground LLM answers.
| llmware-ai/llmware | trustedsec/social-engineer-toolkit | waditu/tushare | |
|---|---|---|---|
| Stars | 14,860 | 14,859 | 14,878 |
| Language | Python | Python | Python |
| Last pushed | 2026-05-17 | — | — |
| Maintenance | Maintained | — | — |
| Setup difficulty | moderate | moderate | easy |
| Complexity | 4/5 | 4/5 | 2/5 |
| Audience | developer | ops devops | data |
Figures from each repo's GitHub metadata at analysis time.
Local inference engines (GGUF, ONNX, OpenVINO) each have their own runtime install, embedding stores need a vector DB configured.
llmware is a Python framework for building applications that combine company documents with language models, what the project calls knowledge-based LLM applications. It is targeted at enterprise teams that want to run things locally or in private, including on regular laptops and AI PCs, instead of sending all their data to a cloud API. The README highlights support for Windows, Mac, and Linux, and for several local inference engines including GGUF, OpenVINO, ONNX Runtime, ONNX Runtime with Qualcomm NPU acceleration, Windows Local Foundry, and PyTorch. The project has two main parts. The first is a model catalog with more than 300 models prepackaged in compact, quantized formats, including the project's own small fine-tuned families called BLING, DRAGON, SLIM, and Industry-BERT, which are built for tasks like extracting facts from documents. The catalog also includes hosted cloud models from OpenAI, Anthropic, and Google, all accessed through the same Python interface. The second part is a retrieval-augmented-generation pipeline: tools to parse files like PDF, Word, Excel, PowerPoint, HTML, images, and audio, chunk the text, create a Library, which is the project's name for a knowledge base, install vector embeddings on it, and then run text, semantic, hybrid, and filtered queries. The README shows short Python snippets for each step: ModelCatalog().load_model, Library().add_files, install_new_embedding, Query().semantic_query, and Prompt().prompt_with_source. The pitch in the project's words is that AI should be sustainable, accurate, and cost effective, using the smallest model that does the job. The full README is longer than what was shown.
llmware is a Python framework for building enterprise RAG pipelines with local small models, document parsing, vector search, and a catalog of 300+ packaged LLMs.
Mainly Python. The stack also includes Python, GGUF, ONNX.
Maintained — commit in last 6 months (last push 2026-05-17).
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.