Research how language models respond to safety constraints and what happens when they're removed.
Create custom versions of open-source models with different safety behaviors for specific use cases.
Benchmark and test model capabilities before and after safety alignment modifications.
Requires GPU/CUDA, large model downloads, and complex PyTorch/bitsandbytes setup for parameter optimization.
Heretic is a command-line tool that automatically removes the built-in refusal behavior, which the README calls censorship or safety alignment, from large language models. A language model's safety alignment is the training that makes it decline certain prompts. Heretic edits the model so it stops refusing, while trying to preserve the rest of its capabilities. It does this with a technique called directional ablation, also known as abliteration, which identifies specific internal directions in the model that correspond to refusal behavior and removes them. Heretic wraps that technique with an automatic parameter optimizer powered by Optuna using a TPE search, so it can find good settings on its own. It searches by jointly minimizing two numbers: how often the model refuses harmful prompts, and the KL divergence (a measure of how much outputs shifted) from the original model on harmless prompts. The result is a decensored version that stays close to the original. Someone would use Heretic to publish or experiment with an uncensored variant of an open-weights model without doing the interpretability work themselves. The README notes the community has already published over 3000 models produced with it. The tool can also save the result, upload it to Hugging Face, let you chat with it, or run standard benchmarks. It is written in Python and needs a Python 3.10+ environment with PyTorch 2.2+, with optional bitsandbytes quantization to reduce VRAM. It supports most dense transformer models including several mixture-of-experts and multimodal architectures, though not pure state-space models. The full README is longer than what was provided.
Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.