Gemma-4-E4B uncensored GGUF quantizations land on HuggingFace
Two GGUF quantizations of felldude's gemma-4-e4b-uncensored model appeared on HuggingFace this week, packaged by mradermacher for local inference.
Two GGUF quantizations of felldude's gemma-4-e4b-uncensored model appeared on HuggingFace on June 23, 2026, packaged by mradermacher for local inference. Both carry Apache 2.0 licenses and target English-language use.
GGUF is the format llama.cpp and its derivatives use to run large language models on consumer hardware. Quantization trades precision for smaller file sizes and faster inference, letting users run models that would otherwise require server-grade GPUs. The technique compresses floating-point weights into lower-bit representations—8-bit, 4-bit, or even 2-bit—without retraining the model. The i1 variant uses importance-matrix quantization, preserving accuracy on high-impact weights while aggressively compressing less-critical parameters. The result is a checkpoint that fits in consumer RAM and runs at speeds practical for local chat, code completion, and text generation.
The mradermacher namespace on HuggingFace has become a de facto clearinghouse for community quantizations, hosting hundreds of GGUF conversions and typically uploading new variants within hours of the original checkpoint's release. This mirrors the broader open-weight ecosystem, where a single base model often spawns dozens of derivative formats, fine-tunes, and merge experiments. The uncensored label signals that safety tuning has been removed or never applied—a common practice in the local-inference community where users want full control over model behavior.




