Gemma-4-E2B-it abliterated GGUF quantization lands on HuggingFace
Dzluck published quantized GGUF files of an abliterated Gemma-4-E2B-it checkpoint, making the uncensored variant easier to run locally on consumer hardware.
Dzluck published quantized GGUF weights of an abliterated Gemma-4-E2B-it checkpoint on HuggingFace this week. The upload converts trevorjs/gemma-4-e2b-it-uncensored into the GGUF format, which llama.cpp and other inference engines can load directly on consumer GPUs and CPUs. Abliteration removes safety refusal behavior from the base weights, and the model card tags the checkpoint as uncensored with English as the primary language.
GGUF quantization typically cuts memory footprint by 50–75 percent compared to full-precision safetensors, letting practitioners run larger models on mid-range hardware. The base trevorjs checkpoint is a fine-tune of Google's Gemma-4-E2B-it, though neither the original release notes nor the trevorjs card specify parameter count or context length. The Dzluck repo includes multiple quantization levels — Q4_K_M, Q5_K_M, Q6_K, and Q8_0 are common GGUF variants — though the card does not list file sizes or which quants shipped.
The upload shows zero downloads and zero likes as of publication, suggesting it went live within the past few hours. Abliterated Gemma checkpoints have circulated since mid-2024, but GGUF conversions of newer Gemma-4 variants remain sparse. Practitioners looking to test the weights will want to confirm context length and whether the E2B-it instruction format requires a custom chat template before running inference — details that should appear in the next iteration of the model card.




