Qwen 3.6 40B abliterated checkpoint lands on HuggingFace in Q8 GGUF

A multi-stage fine-tuned, abliterated Qwen 3.6 40B checkpoint in Q8 GGUF format is now available for local inference without safety filters.

May 17, 2026

Qwen 3.6 40B abliterated checkpoint lands on HuggingFace in Q8 GGUF

Practitioners running uncensored local inference now have another option: dthomasset's Qwen 3.6 40B uncensored Q8 GGUF, a multi-stage fine-tuned checkpoint that strips the base model's safety layer and ships in quantized GGUF format for CPU and consumer GPU setups.

The model card lists the pipeline as image-text-to-text, indicating multimodal capability—text and vision inputs—though the card does not specify context length or hardware requirements for the 40-billion-parameter Q8 quantization. The tags include "abliterated," "heretic," and "unsloth," suggesting the checkpoint was processed with Unsloth's fine-tuning stack and had its refusal behavior removed through abliteration or a similar technique. The "multi-stage tuned" tag implies at least two rounds of fine-tuning, though the card does not detail the datasets or stages.

Qwen 3.6 is Alibaba's latest open-weight series, released in late 2024 with Apache 2.0 licensing. The 40B parameter tier sits between the 14B and 72B sizes in that family. Quantizing to Q8 (8-bit per weight) typically cuts memory footprint by roughly 75 percent compared to FP16, bringing a 40B model into the 40–50 GB VRAM range—still a multi-GPU or high-end single-card setup, but within reach of RTX 4090 or A6000 owners. Since appearing on HuggingFace this week, the checkpoint has accumulated 734 downloads.

The model card does not include benchmark scores, sample outputs, or a detailed ablation methodology. For practitioners who want an uncensored multimodal model in the 40B class and already run GGUF-compatible inference engines like llama.cpp or Kobold, this checkpoint is ready to download and test.

More in Releases