Gemma-4-E4B abliterated checkpoint adds uncensored multimodal inference
Delytalker released an abliterated Gemma 4 checkpoint with vision and audio support, shipping in GGUF format for local multimodal inference without safety filters.
Delytalker has released Gemma-4-E4B-Uncensored-HauhauCS-Aggressive, an abliterated multimodal checkpoint that strips safety layers from Google's Gemma 4 architecture. The model handles image-text-to-text pipelines with audio support, enabling cross-modal inference without alignment constraints. Weights landed on HuggingFace in GGUF format, the quantization-friendly container that runs in llama.cpp and Ollama.
The "HauhauCS-Aggressive" suffix refers to a specific abliteration recipe—a technique that surgically removes refusal behavior from instruction-tuned models. Gemma 4's base architecture already supports vision natively; this release extends that capability to uncensored use cases by removing the safety alignment Google baked into official checkpoints.
GGUF quantization and local deployment
GGUF packaging means the weights are pre-quantized or ready for on-the-fly quantization, a key advantage for practitioners running inference on consumer GPUs or Apple Silicon. The model card does not yet list download counts, benchmark numbers, context length, or parameter count—details that would clarify how this checkpoint compares to other Gemma 4 abliterations already circulating.
The multimodal tags (vision, audio, image-text-to-text) suggest the model can ingest images and potentially audio inputs alongside text prompts, though the card does not specify whether audio support is native or requires a separate encoder. Practitioners interested in unrestricted vision-language workflows now have another Gemma 4 option to test against Llama 3.2 Vision abliterations and open-weight Qwen2-VL fine-tunes.
