Gemma-4 Ortenzya 31B uncensored GGUF quantization lands on HuggingFace
mradermacher released quantized GGUF weights for a 31-billion-parameter uncensored Gemma-4 fine-tune optimized for creative writing, available now on HuggingFace.
mradermacher released quantized GGUF weights for Gemma-4 Ortenzya The Creative Wordsmith 31B, an uncensored fine-tune of Google's Gemma-4 architecture optimized for creative writing tasks. The model landed on HuggingFace on May 18 with the "heretic" and "decensored" tags, signaling removal of safety guardrails present in the base Gemma-4 release.
The 31-billion-parameter checkpoint ships in GGUF format, the quantized weight format that runs efficiently on consumer hardware via llama.cpp and compatible inference engines. The model card lists Unsloth and Transformers as supported pipelines, with the "i1" suffix indicating an imatrix-quantized variant that preserves more accuracy at lower bit depths than naive quantization.
Quantization and hardware requirements
GGUF quantization trades precision for memory footprint—a 31B model that would consume 62GB at full bfloat16 precision can run in 16–24GB of VRAM at Q4 or Q5 quantization levels, making it accessible to RTX 4090 and similar single-GPU setups. The "heretic" tag denotes models that have had alignment training reversed or bypassed, a common pattern in the uncensored-model community where practitioners value output flexibility over corporate safety policies.
The Ortenzya variant name suggests a creative-writing-focused fine-tune, likely trained on fiction, dialogue, or narrative datasets where the base Gemma-4 instruction tuning would otherwise refuse certain prompts. Fresh quantized releases typically see slow initial adoption as they circulate through local-LLM communities.
