Qwen 3.6 40B abliterated weights land in Q8 GGUF on HuggingFace

A new uncensored variant of Alibaba's Qwen 3.6 40B multimodal model has been released in Q8 GGUF quantization, designed for local inference without safety filtering.

May 14, 2026

Qwen 3.6 40B abliterated weights land in Q8 GGUF on HuggingFace

Qwen 3.6 40B Uncensored, a multi-stage fine-tuned abliterated variant of Alibaba's Qwen 3.6 40B, landed on HuggingFace on May 14. The weights are quantized to Q8 in GGUF format and support image-text-to-text pipelines, inheriting the base model's vision capabilities while removing content guardrails through post-training. Unsloth, a memory-efficient training toolkit popular in the open-weight community, was used in the fine-tuning process.

Q8 quantization typically requires 42–45 GB of VRAM for inference, making it accessible to high-end consumer GPUs or multi-GPU setups. The GGUF container format runs natively in llama.cpp derivatives—Ollama, LM Studio, and other local inference tools that support vision models. The model card lists "abliterated," "uncensored," and "all use cases" among its tags, signaling the absence of content restrictions.

Abliterated models have become a recurring category on HuggingFace for Llama and Qwen base weights, typically produced through fine-tuning on refusal-suppression datasets or direct weight editing to reduce safety-tuned behavior. Qwen's native multimodal architecture makes it a natural candidate for such variants. The checkpoint currently shows zero downloads and zero likes on the hub.

ByAlex Sokoloff·AI enthusiast·MSc Computer Science

More in Releases