Llama 3.3 8B Heretic Uncensored 6-bit MLX weights arrive for Apple Silicon
A new 6-bit MLX quantization of Llama 3.3 8B Instruct, trained with Claude 4.5 Opus reasoning data and stripped of safety guardrails, is now available for Apple Silicon users.
Llama 3.3 8B Heretic Uncensored, a 6-bit quantized MLX checkpoint from nhe-ai, combines Meta's Llama 3.3 8B Instruct base with Claude 4.5 Opus reasoning traces and Heretic-style uncensored fine-tuning. The model card lists "thinking" and "reasoning" tags alongside "heretic," signaling both chain-of-thought capability and the removal of safety filters. The MLX format targets Apple Silicon — M-series Macs and iPads — where 6-bit quantization keeps the 8-billion-parameter model under 5 GB and runnable on consumer hardware.
The "Heretic" lineage refers to a family of abliterated Llama fine-tunes that strip refusal behavior without retraining the entire model, a technique that has gained traction in the uncensored-model community over the past year. Pairing that approach with high-reasoning distillation from Claude 4.5 Opus — Anthropic's flagship model known for multi-step logic — is a newer experiment. Whether the reasoning gains survive quantization to 6 bits and whether the uncensored tuning preserves instruction-following remain open questions until users benchmark it against the base Llama 3.3 8B Instruct and other abliterated variants.
The model went live this week on HuggingFace with no eval numbers, sample outputs, or detailed methodology section published yet. Early adopters will need to run their own benchmarks to assess whether the reasoning-plus-uncensored combination holds up in practice or whether the dual fine-tuning introduces new failure modes. The next few days should clarify whether this hybrid approach delivers on both fronts or whether future releases will need to isolate which component — reasoning distillation or uncensoring — is driving performance.
