juliusbrussee/cavegemma

Analysis updated 2026-06-24

★ 13PythonAudience · researcherComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((cavegemma))
    Inputs
      Gemma 4 31B base
      Seed prompts
      Synthesize scripts
    Outputs
      Merged bf16 model
      LoRA adapter
      Eval metrics
    Use Cases
      Run terse code assistant
      Reuse training pipeline
      Study QLoRA recipe
    Tech Stack
      Python
      Transformers
      Unsloth
      TRL
      QLoRA

mindmap root((cavegemma)) Inputs Gemma 4 31B base Seed prompts Synthesize scripts Outputs Merged bf16 model LoRA adapter Eval metrics Use Cases Run terse code assistant Reuse training pipeline Study QLoRA recipe Tech Stack Python Transformers Unsloth TRL QLoRA

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Load the merged bf16 cavegemma model in transformers and chat with apply_chat_template

USE CASE 2

Stack the LoRA adapter on top of google/gemma-4-31B-it for a smaller download

USE CASE 3

Reproduce the QLoRA training run with the included Unsloth and TRL config.toml

USE CASE 4

Reuse the data, training, and eval folders as a template for a different style fine-tune

What is it built with?

PythonTransformersUnslothTRLQLoRAPyTorch

How does it compare?

	juliusbrussee/cavegemma	1lystore/awaek	actashui/sjtu-ppt-template-skill
Stars	13	13	13
Language	Python	Python	Python
Setup difficulty	hard	moderate	moderate
Complexity	4/5	2/5	2/5
Audience	researcher	vibe coder	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Inference needs a GPU with enough VRAM for Gemma 4 31B, full reproduction needs a 96 GB RTX PRO 6000 class GPU.

MIT license on the code, distributed under the Gemma terms for the model weights.

In plain English

cavegemma is a fine-tuned version of Google's Gemma 4 31B language model trained to answer in a stripped-down 'caveman' style. The tagline is 'why use many token when few do trick.' Where the base model writes long paragraphs with full sentences and articles, this version drops filler and articles and keeps only the essentials. The style rules come from a separate repo by the same author called caveman, and the goal is to bake those rules into the weights so no system prompt or skill file is needed at inference time. The README ships two artifacts on Hugging Face. The first is a merged bf16 model at JBrussee/gemma-4-31B-caveman, around 62.5 GB, loadable directly with transformers. The second is a LoRA adapter at JBrussee/gemma-4-31B-caveman-lora, around 534 MB, which stacks on top of google/gemma-4-31B-it. Two short Python snippets show how to load each version, then call apply_chat_template to send a message. Training used QLoRA with NF4 quantization, double quantization, and bf16 compute on top of google/gemma-4-31B-it. The LoRA was rank 16 with alpha 32 and zero dropout, targeting all linear layers. The dataset was 1750 training pairs plus 193 evaluation pairs across debug, review, refactor, dialogue, and Q&A. The schedule was three epochs at learning rate 2e-4 cosine, effective batch size 16, and completion-only loss. Hardware was a RunPod RTX PRO 6000 Blackwell with 96 GB at about $1.89 an hour, wall time around 50 minutes, and total cost roughly four to five dollars. Evaluation is reported on the 193-pair holdout split by category. Code preservation is 96 to 100 percent for exact code fence matches, article density drops from an English baseline near 8 percent down to 0.5 to 2 percent, and semantic similarity to gold answers sits at 91 to 98 percent. The README notes a weak spot in compression: the model reaches a ratio between 0.6 and 0.9 while the gold pairs sit at 0.3 to 0.5, and the author plans to tighten the filter on the next training run. The repo splits the work across data, training, and eval folders. Data holds seed prompts, per-source HuggingFace loaders, a corpus builder, a synthesize script that uses Claude Code or Codex CLI to rewrite samples, a filter that checks code fence integrity, and a split script. Training holds an Unsloth and TRL script plus a config.toml. Eval holds metric and LLM-judge scripts. License is MIT, distributed under the Gemma terms.

Copy-paste prompts

Prompt 1

Load JBrussee/gemma-4-31B-caveman-lora on top of google/gemma-4-31B-it and run the README chat snippet

Prompt 2

Reproduce the QLoRA training in cavegemma with the same NF4 config and a smaller 200-pair dataset

Prompt 3

Run the eval scripts in cavegemma against my own holdout set and report code preservation and article density

Prompt 4

Tighten the compression filter in cavegemma so generations sit closer to the 0.3 to 0.5 gold ratio

Prompt 5

Adapt the synthesize script in cavegemma to use a local Claude Code CLI instead of Codex

Frequently asked questions

What is cavegemma?

QLoRA fine-tune of Gemma 4 31B that answers in a stripped-down caveman style, ships as a 62 GB merged bf16 model and a 534 MB LoRA adapter on Hugging Face.

What language is cavegemma written in?

Mainly Python. The stack also includes Python, Transformers, Unsloth.

What license does cavegemma use?

MIT license on the code, distributed under the Gemma terms for the model weights.

How hard is cavegemma to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is cavegemma for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.