Load the merged bf16 cavegemma model in transformers and chat with apply_chat_template
Stack the LoRA adapter on top of google/gemma-4-31B-it for a smaller download
Reproduce the QLoRA training run with the included Unsloth and TRL config.toml
Reuse the data, training, and eval folders as a template for a different style fine-tune
Inference needs a GPU with enough VRAM for Gemma 4 31B; full reproduction needs a 96 GB RTX PRO 6000 class GPU.
cavegemma is a fine-tuned version of Google's Gemma 4 31B language model trained to answer in a stripped-down 'caveman' style. The tagline is 'why use many token when few do trick.' Where the base model writes long paragraphs with full sentences and articles, this version drops filler and articles and keeps only the essentials. The style rules come from a separate repo by the same author called caveman, and the goal is to bake those rules into the weights so no system prompt or skill file is needed at inference time. The README ships two artifacts on Hugging Face. The first is a merged bf16 model at JBrussee/gemma-4-31B-caveman, around 62.5 GB, loadable directly with transformers. The second is a LoRA adapter at JBrussee/gemma-4-31B-caveman-lora, around 534 MB, which stacks on top of google/gemma-4-31B-it. Two short Python snippets show how to load each version, then call apply_chat_template to send a message. Training used QLoRA with NF4 quantization, double quantization, and bf16 compute on top of google/gemma-4-31B-it. The LoRA was rank 16 with alpha 32 and zero dropout, targeting all linear layers. The dataset was 1750 training pairs plus 193 evaluation pairs across debug, review, refactor, dialogue, and Q&A. The schedule was three epochs at learning rate 2e-4 cosine, effective batch size 16, and completion-only loss. Hardware was a RunPod RTX PRO 6000 Blackwell with 96 GB at about $1.89 an hour, wall time around 50 minutes, and total cost roughly four to five dollars. Evaluation is reported on the 193-pair holdout split by category. Code preservation is 96 to 100 percent for exact code fence matches, article density drops from an English baseline near 8 percent down to 0.5 to 2 percent, and semantic similarity to gold answers sits at 91 to 98 percent. The README notes a weak spot in compression: the model reaches a ratio between 0.6 and 0.9 while the gold pairs sit at 0.3 to 0.5, and the author plans to tighten the filter on the next training run. The repo splits the work across data, training, and eval folders. Data holds seed prompts, per-source HuggingFace loaders, a corpus builder, a synthesize script that uses Claude Code or Codex CLI to rewrite samples, a filter that checks code fence integrity, and a split script. Training holds an Unsloth and TRL script plus a config.toml. Eval holds metric and LLM-judge scripts. License is MIT, distributed under the Gemma terms.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.