Qwen3.5-4B abliterated fine-tune with reasoning lands on HuggingFace
WeilJimmer released GGUF-quantized weights for an uncensored Qwen3.5-4B model with chain-of-thought reasoning, packaged for llama.cpp inference on consumer hardware.
An uncensored fine-tune of Alibaba's Qwen3.5-4B model landed on HuggingFace this week, packaged as GGUF-quantized weights for llama.cpp and Unsloth inference. The release strips safety filters from the base model while adding chain-of-thought reasoning—intermediate reasoning steps visible before final answers.
At Q4 precision, GGUF quantization brings the model down to roughly 2.5–3 GB, runnable on mid-tier gaming laptops and Apple Silicon Macs without cloud APIs. Qwen3.5 includes vision-language capability, a rarity at this scale—most multimodal models this small sacrifice image understanding for text performance. The model card tags it as conversational and endpoints-compatible, though practitioners running uncensored weights typically prefer local execution to avoid content moderation at the API layer.
Qwen3.5 sits between Alibaba's 1.8B mobile-focused variants and its 7B+ server-class checkpoints, competing directly with Meta's Llama and Mistral's open releases in the mid-size tier. Chain-of-thought reasoning in a sub-10B open model remains uncommon; the capability became standard after OpenAI's o1 release but has been slow to propagate into smaller open-weight alternatives.
