Cloud19 quantizes G4-MeroMero-26B to FP8 for consumer GPU inference

Cloud19 released an FP8-quantized version of the G4-MeroMero-26B uncensored model on HuggingFace, optimized for vLLM inference with dynamic quantization to cut memory footprint.

ByAlex Sokoloff·June 24, 2026

Cloud19 quantizes G4-MeroMero-26B to FP8 for consumer GPU inference

Cloud19 released G4-MeroMero-26B-FP8-Dynamic-Uncensored on HuggingFace on June 21, an FP8-quantized variant of the uncensored Gemma4-based mixture-of-experts model. The quantization uses llm-compressor and targets vLLM deployment, compressing the 26-billion-parameter architecture to fit consumer GPUs while preserving the base model's uncensored instruction-tuning.

FP8 dynamic quantization compresses weights to 8-bit floating point during inference—a middle ground between full precision and aggressive INT8 that typically retains more accuracy on long-context or multi-turn tasks. Dynamic quantization calibrates per-tensor scales at runtime rather than baking them in during a separate quantization pass, which can help preserve output quality when prompt distributions shift. The model ships in SafeTensors format, the standard serialization for open-weight releases that prevents arbitrary code execution on load. Its vLLM tag signals compatibility with the popular inference server, which has become the de facto standard for serving large language models at scale; vLLM's paged attention and continuous batching make it particularly well-suited for high-throughput deployments where multiple users hit the same model concurrently.

FP8 quantization typically halves VRAM requirements compared to BF16, bringing a 26B parameter model within reach of a single RTX 4090 or similar hardware. The uncensored base runs locally without API-level content filtering, making it accessible for research, creative writing, and applications where safety alignment would otherwise block legitimate use cases.

ZenCreator

Cloud19 quantizes G4-MeroMero-26B to FP8 for consumer GPU inference

More in Releases

Five uncensored Qwen3.6-35B fine-tunes surface on HuggingFace in 24 hours

NormGuard preserves image quality in flow-model RL fine-tuning by capping velocity inflation

PP-OCRv6 scales from 1.5M to 34.5M parameters across 50 languages

OpenAI previews GPT-5.6-sol reasoning model for Pro and Enterprise users

OpenAI previews GPT-5.6 Sol with stronger coding and cybersecurity