SuperGemma4 26B abliterated weights land in GGUF for local inference
A new quantized variant of Gemma4 26B with safety guardrails removed landed on HuggingFace this week, targeting local inference on Apple Silicon and llama.cpp runtimes.
Practitioners running uncensored language models on consumer hardware now have another option: SuperGemma4 26B, an abliterated variant of Google's Gemma4 architecture, just appeared on HuggingFace in GGUF format on May 12.
The model card lists tags for llama.cpp compatibility, Apple Silicon optimization, and Korean language support. GGUF quantization means the 26-billion-parameter weights can run locally without enterprise GPU clusters—a common setup for practitioners who want unrestricted output without API rate limits or content filters.
Gemma4 is Google's latest iteration of the Gemma family, released earlier this year as open weights under a permissive license. Abliterated variants strip the safety tuning that blocks certain prompts in the base release. The "v2" suffix in the repo name suggests this is a revised quantization or merge, though the model card itself doesn't detail what changed from a hypothetical v1. For users already running llama.cpp or Ollama on M-series Macs, adding another 26B option to the rotation is a matter of pulling the GGUF file and pointing the runtime at it.
