MiniMax M2.7 abliterated to 4-bit AWQ for local inference
Alonsoko released a 4-bit AWQ quantization of MiniMax M2.7 with safety filters removed, now available on HuggingFace as a text-generation checkpoint.
MiniMax M2.7 ultra-uncensored heretic AWQ is a 4-bit quantized checkpoint from alonsoko that removes safety filters from the MiniMax M2.7 base model. The AWQ-W4A16 quantization compresses weights to 4 bits while keeping activations at 16-bit precision, reducing memory footprint for local inference. The model ships in safetensors format and targets the text-generation pipeline, with tags indicating abliteration and removal of alignment layers that block certain prompts.
AWQ quantization preserves perplexity within 1–2 percent of full-precision baselines while cutting VRAM requirements by roughly 75 percent. A 2.7-billion-parameter model at 4-bit precision typically fits in 2–3 GB of GPU memory, making it runnable on cards as modest as an RTX 3060. Abliteration techniques surgically remove refusal behavior without full retraining, a method that gained traction in 2024 with Llama checkpoints and has since spread to smaller open-weight models. The safetensors format ensures fast loading across inference engines like llama.cpp, vLLM, and Transformers. The checkpoint was posted to HuggingFace on May 17, 2026.
