Gemma 4 abliterated checkpoints in 12B and 26B drop on HuggingFace

A HuggingFace user has released uncensored Gemma 4 checkpoints in 12B and 26B parameter counts, tagged as abliterated and available in safetensors and GGUF formats.

ByAlex Sokoloff·June 12, 2026

Gemma 4 abliterated checkpoints in 12B and 26B drop on HuggingFace

Uncensored Gemma 4 weights landed on HuggingFace this week. User llmfan46 uploaded five separate checkpoints spanning 12B and 26B parameter sizes, all tagged "heretic," "uncensored," "decensored," and "abliterated." The releases mark the latest community effort to strip safety tuning from Google's multimodal instruction models.

The 26B variant is listed as an image-text-to-text pipeline in unquantized safetensors form. The 12B family includes four separate repos: unquantized safetensors, standard GGUF, and two NVFP4 variants (one GGUF, one safetensors). All five repos were uploaded on June 11–12, 2026. The GGUF releases are formatted for llama.cpp and similar CPU/GPU inference engines, making the models accessible to users without datacenter hardware. The NVFP4 variants target NVIDIA's FP4 precision format, a recent addition to the quantization toolkit that trades some accuracy for faster inference on recent GPU architectures.

Abliteration—the practice of surgically removing refusal behavior from instruction-tuned models—has become standard in the open-weight community. The method typically involves identifying and suppressing activation patterns associated with safety responses, allowing models to answer prompts that would otherwise trigger refusals. Google's Gemma family has been a frequent target for abliteration work since the original Gemma 2 release, largely because the base architecture is competitive with commercial alternatives while remaining fully open-weight. The unquantized safetensors checkpoints preserve full precision for users who want to fine-tune or merge the weights.

ZenCreator

Gemma 4 abliterated checkpoints in 12B and 26B drop on HuggingFace

More in Releases

PAJAMA distills LLM judges into programs, cuts eval cost by 100×

Molt: NVIDIA's PyTorch framework cuts agentic RL iteration cost

Hypernetworks outscale LoRA for train-time knowledge injection in LLMs

Staleness-Adaptive Trust Region cuts asynchronous RL performance loss to 3% at 8× policy lag

Distilled RL transfers knowledge across model families without unconditional imitation