explaingit

juliusbrussee/cavegemma

16PythonAudience · researcherComplexity · 4/5ActiveLicenseSetup · hard

TLDR

QLoRA fine-tune of Gemma 4 31B that answers in a stripped-down caveman style; ships as a 62 GB merged bf16 model and a 534 MB LoRA adapter on Hugging Face.

Mindmap

mindmap
  root((cavegemma))
    Inputs
      Gemma 4 31B base
      Seed prompts
      Synthesize scripts
    Outputs
      Merged bf16 model
      LoRA adapter
      Eval metrics
    Use Cases
      Run terse code assistant
      Reuse training pipeline
      Study QLoRA recipe
    Tech Stack
      Python
      Transformers
      Unsloth
      TRL
      QLoRA

Things people build with this

USE CASE 1

Load the merged bf16 cavegemma model in transformers and chat with apply_chat_template

USE CASE 2

Stack the LoRA adapter on top of google/gemma-4-31B-it for a smaller download

USE CASE 3

Reproduce the QLoRA training run with the included Unsloth and TRL config.toml

USE CASE 4

Reuse the data, training, and eval folders as a template for a different style fine-tune

Tech stack

PythonTransformersUnslothTRLQLoRAPyTorch

Getting it running

Difficulty · hard Time to first run · 1day+

Inference needs a GPU with enough VRAM for Gemma 4 31B; full reproduction needs a 96 GB RTX PRO 6000 class GPU.

MIT license on the code, distributed under the Gemma terms for the model weights.

In plain English

cavegemma is a fine-tuned version of Google's Gemma 4 31B language model trained to answer in a stripped-down 'caveman' style. The tagline is 'why use many token when few do trick.' Where the base model writes long paragraphs with full sentences and articles, this version drops filler and articles and keeps only the essentials. The style rules come from a separate repo by the same author called caveman, and the goal is to bake those rules into the weights so no system prompt or skill file is needed at inference time. The README ships two artifacts on Hugging Face. The first is a merged bf16 model at JBrussee/gemma-4-31B-caveman, around 62.5 GB, loadable directly with transformers. The second is a LoRA adapter at JBrussee/gemma-4-31B-caveman-lora, around 534 MB, which stacks on top of google/gemma-4-31B-it. Two short Python snippets show how to load each version, then call apply_chat_template to send a message. Training used QLoRA with NF4 quantization, double quantization, and bf16 compute on top of google/gemma-4-31B-it. The LoRA was rank 16 with alpha 32 and zero dropout, targeting all linear layers. The dataset was 1750 training pairs plus 193 evaluation pairs across debug, review, refactor, dialogue, and Q&A. The schedule was three epochs at learning rate 2e-4 cosine, effective batch size 16, and completion-only loss. Hardware was a RunPod RTX PRO 6000 Blackwell with 96 GB at about $1.89 an hour, wall time around 50 minutes, and total cost roughly four to five dollars. Evaluation is reported on the 193-pair holdout split by category. Code preservation is 96 to 100 percent for exact code fence matches, article density drops from an English baseline near 8 percent down to 0.5 to 2 percent, and semantic similarity to gold answers sits at 91 to 98 percent. The README notes a weak spot in compression: the model reaches a ratio between 0.6 and 0.9 while the gold pairs sit at 0.3 to 0.5, and the author plans to tighten the filter on the next training run. The repo splits the work across data, training, and eval folders. Data holds seed prompts, per-source HuggingFace loaders, a corpus builder, a synthesize script that uses Claude Code or Codex CLI to rewrite samples, a filter that checks code fence integrity, and a split script. Training holds an Unsloth and TRL script plus a config.toml. Eval holds metric and LLM-judge scripts. License is MIT, distributed under the Gemma terms.

Copy-paste prompts

Prompt 1
Load JBrussee/gemma-4-31B-caveman-lora on top of google/gemma-4-31B-it and run the README chat snippet
Prompt 2
Reproduce the QLoRA training in cavegemma with the same NF4 config and a smaller 200-pair dataset
Prompt 3
Run the eval scripts in cavegemma against my own holdout set and report code preservation and article density
Prompt 4
Tighten the compression filter in cavegemma so generations sit closer to the 0.3 to 0.5 gold ratio
Prompt 5
Adapt the synthesize script in cavegemma to use a local Claude Code CLI instead of Codex
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.