explaingit

bartamin/fine-tuned-rag

12PythonAudience · researcherComplexity · 4/5ActiveSetup · hard

TLDR

Research project that fine-tunes a small query-side network on top of frozen embeddings to boost retrieval on legal RAG, trained on synthetic Claude Haiku questions with Multiple Negatives Ranking loss.

Mindmap

mindmap
  root((Fine-tuned-RAG))
    Inputs
      Legal corpus chunks
      Synthetic questions
      Raw embeddings
    Outputs
      Transformed query vector
      Retrieval scores
      Judge ratings
    Use Cases
      Domain RAG tuning
      Legal QA retrieval
      Embedding adapter research
    Tech Stack
      Python
      PyTorch
      Optuna
      Claude Haiku

Things people build with this

USE CASE 1

Train a query-side adapter that improves Hit Rate on a domain-specific RAG corpus

USE CASE 2

Generate synthetic question/chunk pairs from a corpus using Claude Haiku

USE CASE 3

Run an LLM-judge eval that scores RAG answers on six 1-5 metrics

USE CASE 4

Sweep retrieval hyperparameters with Optuna keyed on validation Hit Rate at 5

Tech stack

PythonPyTorchOptunaClaudeSentence Transformers

Getting it running

Difficulty · hard Time to first run · 1day+

Needs a Claude API key for synthetic-question generation, GPU-friendly PyTorch setup, and Optuna sweeps that take a while to converge.

In plain English

Fine-tuned RAG is a small research project that tries to improve the retrieval step inside a Retrieval Augmented Generation pipeline. In a normal RAG system, when you ask a question the system turns your question into a numeric vector, compares it against the vectors of every document chunk in a database, and returns the closest matches by cosine similarity. The author argues that this default approach treats every dimension of the vector as equally important, which adds noise on a focused corpus like legal text where only some dimensions actually help. The project tests this idea on a public legal dataset called isaacus/legal-rag-bench. The author builds a small extra neural network that sits between the query and the database. It takes the raw query embedding and produces a transformed embedding where useful dimensions get amplified and noisy ones get dampened. The database side stays unchanged, so the system can still use any standard embedding model and any standard vector store. Training data comes from the corpus itself. For each chunk in the training set, Claude Haiku is asked to write up to five realistic legal questions that the chunk could answer, producing about 14,000 question and chunk pairs from around 4,800 chunks. The small network is then trained with Multiple Negatives Ranking loss, which pushes each transformed question vector close to its correct chunk and away from all other chunks in the batch. A masking trick avoids penalising the model when several questions point at the same chunk. Optuna runs Bayesian hyperparameter search and keeps the checkpoint with the best validation Hit Rate at 5. The README reports that on 100 held-out questions, the fine-tuned retriever beats the raw embedding baseline on Hit Rate at 20. It also runs both pipelines through the same LLM to write answers, then uses an LLM judge to score those answers on six metrics from 1 to 5. The fine-tuned version wins on every metric, with the biggest gains in completeness and faithfulness. The code is split into three pipeline scripts: pipeline_generate.py to build the synthetic question and chunk caches, pipeline_train.py to train the network on those caches, and pipeline_eval.py to compare retrieval quality and run the LLM judge. Shared logic lives under src in model.py, train.py, retrieval.py, and generation.py, with all hyperparameters and paths centralised in config.py.

Copy-paste prompts

Prompt 1
Run pipeline_generate.py from Fine-tuned-RAG to build synthetic question caches on my own corpus instead of the legal one.
Prompt 2
Walk me through the model.py adapter network in Fine-tuned-RAG and explain how it amplifies useful embedding dimensions.
Prompt 3
Adapt the Multiple Negatives Ranking loss in Fine-tuned-RAG to handle a corpus where many chunks share the same parent document.
Prompt 4
Run pipeline_eval.py on a new embedding model and compare its Hit Rate at 20 against the baseline reported in the README.
Prompt 5
Replace Claude Haiku in Fine-tuned-RAG with a local LLM for the question-generation step and document the prompt changes.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.