Gemma 4 31B uncensored quantized to 3-bit GGUF for local inference

A 3-bit IQ3_XS quantization of an uncensored Gemma 4 31B fine-tune landed on HuggingFace under Apache 2.0 license, compressing the model to roughly 20 percent of its original size.

ByAlex Sokoloff·May 28, 2026

Gemma 4 31B uncensored quantized to 3-bit GGUF for local inference

A 3-bit IQ3_XS quantization of an uncensored Gemma 4 31B fine-tune is now available on HuggingFace under Apache 2.0 license. The GGUF checkpoint, published by worstplayer, compresses llmfan46's uncensored base—an instruction-tuned variant of Google's Gemma 4 with safety filters removed—to roughly 20 percent of its original FP16 size. IQ3_XS is an extreme low-bitwidth quantization scheme that trades precision for memory footprint, making 30B-class models runnable on mid-tier consumer GPUs or high-RAM laptops without layer offloading.

The quantized weights preserve the conversational format and Apache 2.0 licensing, allowing commercial use and redistribution. The checkpoint includes imatrix metadata for improved quantization quality and is compatible with standard GGUF inference engines like llama.cpp and KoboldCpp. For users running inference on 24GB consumer cards or Apple Silicon Macs with unified memory, ultra-compressed checkpoints like this one are often the only practical path to running 30B-parameter models locally.

Uncensored variants of Google's Gemma family have circulated since the original Gemma 2 release, with community fine-tuners targeting use cases where safety refusals block legitimate workflows—technical documentation, creative writing, and research into model behavior. The heretic label typically signals ablation of refusal mechanisms rather than explicit NSFW training, though the practical effect is the same: the model will respond to prompts that trigger safety blocks in the original. The Apache 2.0 license on both the base and quantized versions removes most legal friction for commercial deployment.

ZenCreator

Gemma 4 31B uncensored quantized to 3-bit GGUF for local inference

More in Releases

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines