Gemma 4 26B uncensored weights quantized to GGUF for local inference
JMingo published quantized GGUF weights for a 26-billion-parameter Gemma 4 variant stripped of safety tuning, based on llmfan46's uncensored fine-tune.
JMingo released GGUF-quantized weights for gemma-4-26B-A4B-it-ultra-uncensored-heretic-UD on HuggingFace this week. The checkpoint is a quantized build of llmfan46's Gemma 4 26B fine-tune, which removes safety guardrails from Google's instruction-tuned base. The Apache 2.0 license permits commercial use without restriction.
GGUF quantization makes the 26-billion-parameter model runnable on consumer hardware. A Q4_K_M quant typically fits in 16 GB VRAM, putting it within reach of a single RTX 4090 or similar card. Q5_K_M builds require roughly 20 GB, and Q8_0 formats push past 24 GB. The quantization format is native to llama.cpp and compatible with Ollama, LM Studio, and other local inference tools. Uncensored Gemma fine-tunes have proliferated on HuggingFace over the past two years, with dozens of variants targeting different use cases—creative writing, roleplay, technical Q&A, and general instruction-following without safety refusals. The "heretic" naming convention typically signals aggressive ablation of refusal behavior.





