explaingit

p-e-w/heretic

Analysis updated 2026-05-18

20,576PythonAudience · developerComplexity · 3/5LicenseSetup · hard

TLDR

Python tool that automatically removes safety restrictions from language models using directional ablation and parameter optimization, without manual retraining.

Mindmap

mindmap
  root((repo))
    What it does
      Removes safety alignment
      Optimizes model parameters
      Supports quantization
    How it works
      Directional ablation
      KL divergence minimization
      Optuna optimizer
    Supported models
      Dense transformers
      Multimodal models
      MoE architectures
    Use cases
      Research on model behavior
      Custom model variants
      Benchmark testing
    Getting started
      pip install heretic-llm
      Point at HF model ID
      Run optimization
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Research how language models respond to safety constraints and what happens when they're removed.

USE CASE 2

Create custom versions of open-source models with different safety behaviors for specific use cases.

USE CASE 3

Benchmark and test model capabilities before and after safety alignment modifications.

What is it built with?

PythonPyTorchTransformersOptunabitsandbytesHugging Face

How does it compare?

p-e-w/hereticothmanadi/planning-with-filesnetbox-community/netbox
Stars20,57620,50420,438
LanguagePythonPythonPython
Setup difficultyhardeasyhard
Complexity3/52/54/5
Audiencedevelopervibe coderops devops

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires GPU/CUDA, large model downloads, and complex PyTorch/bitsandbytes setup for parameter optimization.

Use it freely, but if you run it as a network service, you must release your changes to users. Strongest copyleft for SaaS.

In plain English

Heretic is a command-line tool that removes the built-in refusals, what the README calls censorship or safety alignment, from large language models, the kind that power chatbots. Most modern open-weight models have been trained to refuse certain requests, Heretic alters the model's internal weights so those refusals stop happening, without going through the expensive process of further training the model on new data. The technique underneath is called directional ablation, also known as abliteration, based on published research by Arditi et al. and Lai. The novel part is that Heretic finds the right parameters for abliteration automatically using a TPE-based hyperparameter optimizer powered by Optuna. The optimizer simultaneously minimizes two things: how often the model refuses a set of harmful prompts, and the KL divergence (a statistical measure of how much a probability distribution has changed) from the original model on harmless prompts. The goal is a model that stops refusing but keeps as much of its original intelligence as possible. The README's benchmark table reports its Gemma-3 12B result matches manual abliterations on refusal suppression while showing much lower KL divergence. You use it by preparing a Python 3.10-or-newer environment with PyTorch 2.2 or newer, then running pip install heretic-llm and pointing the heretic command at a model name. The whole process is unsupervised, you do not need to understand transformer internals. It benchmarks your hardware at startup to pick a good batch size, and on an RTX 3090 the README says decensoring an 8-billion-parameter model takes about 45 minutes. Memory use can be cut with bitsandbytes 4-bit quantization. When it finishes, you can save the model, upload it to Hugging Face, chat with it, or run benchmarks. An optional research extra adds interpretability features. Heretic supports most dense transformer models, several mixture-of-experts variants, and some hybrid architectures.

Copy-paste prompts

Prompt 1
I want to use Heretic to remove safety restrictions from a Llama 2 model. Walk me through the installation and basic command to get started.
Prompt 2
How do I use Heretic to optimize a multimodal model and save the result to Hugging Face Hub?
Prompt 3
Show me how to run Heretic with quantization enabled to reduce VRAM usage on a smaller GPU.
Prompt 4
What does directional ablation do in Heretic, and how does it preserve the model's original capabilities?

Frequently asked questions

What is heretic?

Python tool that automatically removes safety restrictions from language models using directional ablation and parameter optimization, without manual retraining.

What language is heretic written in?

Mainly Python. The stack also includes Python, PyTorch, Transformers.

What license does heretic use?

Use it freely, but if you run it as a network service, you must release your changes to users. Strongest copyleft for SaaS.

How hard is heretic to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is heretic for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub p-e-w on gitmyhub

Verify against the repo before relying on details.