Gemma-4 26B abliterated GGUF quantization lands on HuggingFace
A GGUF-quantized build of the abliterated Gemma-4 26B A4B instruction model appeared on HuggingFace this week, offering uncensored local inference for practitioners running llama.cpp or compatible runtimes.
A GGUF-quantized version of the abliterated Gemma-4 26B A4B instruction model landed on HuggingFace this week under the handle zhoushan/gemma-4-26B-A4B-it-uncensored-GGUF. The quantization is built from trevorjs/gemma-4-26b-a4b-it-uncensored, an abliterated variant of Google's Gemma-4 26B parameter model that strips safety tuning. GGUF is the quantized format used by llama.cpp and compatible inference engines, letting users run the model locally on consumer hardware without cloud API restrictions.
Abliteration removes safety alignment layers from a pre-trained model, restoring the base model's ability to respond to prompts that would otherwise trigger refusal behavior. The A4B designation refers to a specific architecture variant in the Gemma-4 family. The model card lists English as the primary language.
Hardware and deployment
GGUF quantization compresses the 26-billion-parameter weights into smaller files that fit into VRAM or system RAM on typical workstation GPUs. Users with 24GB VRAM cards can usually run Q4 or Q5 quantization variants at acceptable speed, while Q8 or full-precision builds may require 48GB or offloading to CPU. Practitioners interested in uncensored local inference can pull the weights directly from the HuggingFace model page and load them into llama.cpp, KoboldCpp, or any GGUF-compatible runtime.




