Llama 3.2 3B uncensored GGUF quantization debuts for CPU inference
Jadzblaze99 published quantized GGUF weights for an uncensored fine-tune of Meta's Llama 3.2 3B Instruct model, enabling CPU and low-VRAM deployment of a 3-billion-parameter conversational model without safety filters.
A quantized GGUF release of an uncensored Llama 3.2 3B Instruct model now runs on consumer CPUs and modest GPUs, opening access to small instruction-following models without safety restrictions.
Jadzblaze99 published the GGUF files based on chuanli11's llama-3.2-3b-instruct-uncensored fine-tune on HuggingFace this week. The GGUF format lets users run the model on CPU or entry-level hardware using llama.cpp, Ollama, or any GGUF-compatible inference engine. At 3 billion parameters, the model fits comfortably in 4–8 GB of RAM depending on quantization level, making it one of the smaller uncensored instruction models currently available in GGUF.
The base fine-tune removes the safety tuning from Meta's original Llama 3.2 3B Instruct release, which launched in September 2024 as part of Meta's lightweight model lineup. The original model was designed for on-device and edge deployment, with a 128k-token context window and support for eight languages. This uncensored variant preserves the architecture and context length while removing content restrictions—a common workflow in the open-weight community for models intended for unrestricted local use.
Users looking for a compact uncensored alternative to larger Llama 3.1 or 3.3 models now have a sub-4B option that runs on consumer hardware without a discrete GPU. The model retains the instruction-following behavior of the base Instruct model, making it suitable for conversational tasks on resource-constrained devices.




