ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

ReleasesNSFW

Gemma 4 26B uncensored 4-bit quantization targets local inference

LeaderboardModel1 released an uncensored, quantized variant of Gemma 4 26B using AutoRound W4A16 compression, enabling unrestricted inference on consumer GPUs.

ByAlex Sokoloff·June 27, 2026

Gemma 4 26B uncensored 4-bit quantization targets local inference

LeaderboardModel1 released an uncensored, quantized checkpoint of Gemma 4 26B on HuggingFace this week, packaged as a 4-bit AutoRound W4A16 RTN variant. The model targets the Low-Bit Open LLM Leaderboard and is designed for memory-efficient deployment without safety tuning.

The "heretic" label in the model name denotes removal of alignment guardrails—a common pattern in community fine-tunes that strip refusal behavior from base instruction models. AutoRound W4A16 compression reduces weight precision to 4 bits while keeping activations at 16-bit, a tradeoff that preserves more inference quality than pure 4-bit quantization at the cost of slightly higher VRAM usage during forward passes.

Local deployment requirements

A 26-billion-parameter model at 4-bit precision requires roughly 13 GB of VRAM for weights alone, making it runnable on a single RTX 4090 or two consumer cards in a split configuration. The unquantized parent checkpoint would need closer to 52 GB, putting it out of reach for most local setups. The model ships in SafeTensors format and supports the HuggingFace text-generation pipeline, enabling drop-in use with standard inference libraries like vLLM or Ollama.

As of publication, the checkpoint shows zero downloads and zero likes, indicating a very recent upload. No benchmark scores, sample outputs, or ablation studies appear on the model card yet.

ZenCreator

Gemma 4 26B uncensored 4-bit quantization targets local inference

Local deployment requirements

More in Releases

Five uncensored Qwen3.6-35B fine-tunes surface on HuggingFace in 24 hours

NormGuard preserves image quality in flow-model RL fine-tuning by capping velocity inflation

PP-OCRv6 scales from 1.5M to 34.5M parameters across 50 languages

OpenAI previews GPT-5.6-sol reasoning model for Pro and Enterprise users

OpenAI previews GPT-5.6 Sol with stronger coding and cybersecurity