bartamin/fine-tuned-rag

Analysis updated 2026-06-24

★ 12PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((Fine-tuned-RAG))
    Inputs
      Legal corpus chunks
      Synthetic questions
      Raw embeddings
    Outputs
      Transformed query vector
      Retrieval scores
      Judge ratings
    Use Cases
      Domain RAG tuning
      Legal QA retrieval
      Embedding adapter research
    Tech Stack
      Python
      PyTorch
      Optuna
      Claude Haiku

mindmap root((Fine-tuned-RAG)) Inputs Legal corpus chunks Synthetic questions Raw embeddings Outputs Transformed query vector Retrieval scores Judge ratings Use Cases Domain RAG tuning Legal QA retrieval Embedding adapter research Tech Stack Python PyTorch Optuna Claude Haiku

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Train a query-side adapter that improves Hit Rate on a domain-specific RAG corpus

USE CASE 2

Generate synthetic question/chunk pairs from a corpus using Claude Haiku

USE CASE 3

Run an LLM-judge eval that scores RAG answers on six 1-5 metrics

USE CASE 4

Sweep retrieval hyperparameters with Optuna keyed on validation Hit Rate at 5

What is it built with?

PythonPyTorchOptunaClaudeSentence Transformers

How does it compare?

	bartamin/fine-tuned-rag	aim-uofa/reasonmatch	arpecop/kokobook
Stars	12	12	12
Language	Python	Python	Python
Setup difficulty	hard	hard	hard
Complexity	4/5	5/5	3/5
Audience	researcher	researcher	general

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Needs a Claude API key for synthetic-question generation, GPU-friendly PyTorch setup, and Optuna sweeps that take a while to converge.

In plain English

Fine-tuned RAG is a small research project that tries to improve the retrieval step inside a Retrieval Augmented Generation pipeline. In a normal RAG system, when you ask a question the system turns your question into a numeric vector, compares it against the vectors of every document chunk in a database, and returns the closest matches by cosine similarity. The author argues that this default approach treats every dimension of the vector as equally important, which adds noise on a focused corpus like legal text where only some dimensions actually help. The project tests this idea on a public legal dataset called isaacus/legal-rag-bench. The author builds a small extra neural network that sits between the query and the database. It takes the raw query embedding and produces a transformed embedding where useful dimensions get amplified and noisy ones get dampened. The database side stays unchanged, so the system can still use any standard embedding model and any standard vector store. Training data comes from the corpus itself. For each chunk in the training set, Claude Haiku is asked to write up to five realistic legal questions that the chunk could answer, producing about 14,000 question and chunk pairs from around 4,800 chunks. The small network is then trained with Multiple Negatives Ranking loss, which pushes each transformed question vector close to its correct chunk and away from all other chunks in the batch. A masking trick avoids penalising the model when several questions point at the same chunk. Optuna runs Bayesian hyperparameter search and keeps the checkpoint with the best validation Hit Rate at 5. The README reports that on 100 held-out questions, the fine-tuned retriever beats the raw embedding baseline on Hit Rate at 20. It also runs both pipelines through the same LLM to write answers, then uses an LLM judge to score those answers on six metrics from 1 to 5. The fine-tuned version wins on every metric, with the biggest gains in completeness and faithfulness. The code is split into three pipeline scripts: pipeline_generate.py to build the synthetic question and chunk caches, pipeline_train.py to train the network on those caches, and pipeline_eval.py to compare retrieval quality and run the LLM judge. Shared logic lives under src in model.py, train.py, retrieval.py, and generation.py, with all hyperparameters and paths centralised in config.py.

Copy-paste prompts

Prompt 1

Run pipeline_generate.py from Fine-tuned-RAG to build synthetic question caches on my own corpus instead of the legal one.

Prompt 2

Walk me through the model.py adapter network in Fine-tuned-RAG and explain how it amplifies useful embedding dimensions.

Prompt 3

Adapt the Multiple Negatives Ranking loss in Fine-tuned-RAG to handle a corpus where many chunks share the same parent document.

Prompt 4

Run pipeline_eval.py on a new embedding model and compare its Hit Rate at 20 against the baseline reported in the README.

Prompt 5

Replace Claude Haiku in Fine-tuned-RAG with a local LLM for the question-generation step and document the prompt changes.

Frequently asked questions

What is fine-tuned-rag?

Research project that fine-tunes a small query-side network on top of frozen embeddings to boost retrieval on legal RAG, trained on synthetic Claude Haiku questions with Multiple Negatives Ranking loss.

What language is fine-tuned-rag written in?

Mainly Python. The stack also includes Python, PyTorch, Optuna.

How hard is fine-tuned-rag to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is fine-tuned-rag for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.