Five uncensored quantizations land on HuggingFace: Gemma-4 coder cuts refusals 91 percent
A fresh batch of GGUF and NVFP4 quantizations hit HuggingFace this week, including abliterated GLM-5.2, a 91-percent-refusal-drop Gemma-4 coder variant, and Nvidia's FP4-compressed Mistral Medium 3.5 128B.
The open-weight community shipped five quantized model variants to HuggingFace on July 2, spanning abliterated weights, coding-focused fine-tunes, and Nvidia's FP4 compression format—a sign of how quickly derivative releases now follow upstream model drops.
Qwythos-9B-Claude-Mythos-5-1M-GGUF from empero-ai ships GGUF quantizations of the Qwythos base model, a 9-billion-parameter variant trained on the Claude-Mythos dataset with a one-million-token context window. GGUF remains the dominant quantization format for local inference, supported natively by llama.cpp and its derivatives.
Nvidia contributed two FP4-compressed releases: Qwen3.6-27B-NVFP4 and Mistral-Medium-3.5-128B-NVFP4. The NVFP4 format uses 4-bit floating-point quantization to shrink model footprints while preserving more dynamic range than integer quantization. The Mistral Medium 3.5 128B release is particularly notable—at 128 billion parameters, it would require roughly 256 GB of VRAM in FP16, but the FP4 version drops that to approximately 64 GB, making it viable on dual-GPU consumer rigs or single high-end datacenter cards.
Gemma-4-12B-coder-fable5-composer2.5-v1-uncensored-heretic-GGUF from llmfan46 merges Heretic v1.4.0 and MPOA techniques into a Gemma-4 coding variant. The model card claims a 91 percent reduction in refusals compared to the base Gemma-4 12B release. Heretic is a community-developed abliteration method that surgically removes refusal behavior by identifying and zeroing out specific weight directions; MPOA (Multi-Prompt Objective Alignment) fine-tunes on diverse prompt styles to prevent the model from pattern-matching common jailbreak structures.
Huihui-GLM-5.2-abliterated-GGUF from huihui-ai applies abliteration to GLM-5.2, stripping safety layers from the base model. GLM-5.2 is a bilingual Chinese-English model from Zhipu AI; the abliterated version removes the alignment training that enforces content policies.





