Qwen 3.6 abliterated fine-tune strips safety for unrestricted multimodal generation
DavidAU released Qwen3.6-12B-IQ-Ultra-Heretic-Uncensored-Thinking-V2-Hightop, a 12-billion-parameter multimodal fine-tune with safety layers removed, targeting creative writing and unrestricted image-text workflows.
An abliterated fine-tune of Alibaba's Qwen 3.6 has landed on HuggingFace, stripping safety tuning to enable unrestricted multimodal generation. DavidAU's Qwen3.6-12B-IQ-Ultra-Heretic-Uncensored-Thinking-V2-Hightop is a 12-billion-parameter checkpoint built on the Qwen 3.6 architecture, with alignment filters removed and a focus on open-ended generation for creative writing and image-text workflows.
The model card tags it for image-text-to-text pipelines, suggesting it handles both vision and language inputs in a single forward pass. The "Heretic" and "abliterated" labels signal that refusal behaviors have been stripped — a common pattern in the uncensored fine-tune ecosystem, where practitioners remove alignment constraints to unlock unrestricted prompting. The "Thinking-V2" suffix hints at chain-of-thought or reasoning-focused prompt templates, though the card doesn't specify training data or methodology.
Qwen 3.6 is Alibaba's latest open-weight multimodal series, released in late 2025 with native vision support and competitive performance on MMMU and DocVQA benchmarks. The base models ship with alignment tuning that blocks certain content categories, making them unsuitable for adult creative writing, unrestricted roleplay, or workflows requiring models to engage with sensitive material without refusal. Fine-tunes like this one extend the base model's reach into those use cases by removing the safety layers entirely.
The checkpoint is available now on HuggingFace under DavidAU's namespace, with Unsloth listed as a training framework tag. Unsloth is a popular fine-tuning library that optimizes memory usage and training speed for consumer GPUs, making it easier for individual researchers to produce custom checkpoints without cloud infrastructure. Practitioners interested in running the model locally will need roughly 24GB of VRAM for full-precision inference, or 12–16GB with 4-bit quantization — well within reach of a single RTX 4090 or equivalent consumer card.
