Llama 3.2 3B uncensored GGUF quantization debuts for CPU inference

Jadzblaze99 published quantized GGUF weights for an uncensored fine-tune of Meta's Llama 3.2 3B Instruct model, enabling CPU and low-VRAM deployment of a 3-billion-parameter conversational model without safety filters.

ByAlex Sokoloff·May 30, 2026

Llama 3.2 3B uncensored GGUF quantization debuts for CPU inference

A quantized GGUF release of an uncensored Llama 3.2 3B Instruct model now runs on consumer CPUs and modest GPUs, opening access to small instruction-following models without safety restrictions.

Jadzblaze99 published the GGUF files based on chuanli11's llama-3.2-3b-instruct-uncensored fine-tune on HuggingFace this week. The GGUF format lets users run the model on CPU or entry-level hardware using llama.cpp, Ollama, or any GGUF-compatible inference engine. At 3 billion parameters, the model fits comfortably in 4–8 GB of RAM depending on quantization level, making it one of the smaller uncensored instruction models currently available in GGUF.

The base fine-tune removes the safety tuning from Meta's original Llama 3.2 3B Instruct release, which launched in September 2024 as part of Meta's lightweight model lineup. The original model was designed for on-device and edge deployment, with a 128k-token context window and support for eight languages. This uncensored variant preserves the architecture and context length while removing content restrictions—a common workflow in the open-weight community for models intended for unrestricted local use.

Users looking for a compact uncensored alternative to larger Llama 3.1 or 3.3 models now have a sub-4B option that runs on consumer hardware without a discrete GPU. The model retains the instruction-following behavior of the base Instruct model, making it suitable for conversational tasks on resource-constrained devices.

ZenCreator

Llama 3.2 3B uncensored GGUF quantization debuts for CPU inference

More in Releases

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines