MagicQuant 2.0 learns tensor-level quants from Unsloth to collapse redundant GGUF options

A new quantization pipeline tests KLD scores across model tensors to surface only the best size-quality tradeoffs, pruning duplicate quant options and flagging architecture quirks automatically.

May 12, 2026

MagicQuant 2.0 learns tensor-level quants from Unsloth to collapse redundant GGUF options

MagicQuant 2.0 is a quantization pipeline that builds hybrid GGUF models by testing which quantization schemes work best for each tensor in a given architecture. After five months of development, the tool learns from existing quant assignments in Unsloth and llama.cpp, then runs a gauntlet of KLD (Kullback-Leibler divergence) tests to find nonlinear quality wins at specific VRAM budgets. The result is a collapsed benchmark table showing only the "survivors"—quants that meaningfully outperform their size-equivalent peers—rather than dumping every Q8/Q6/Q5/Q4 variant without context.

The pipeline detects architecture quirks automatically. Qwen 3.6 27B exhibits patterns that allow genuinely lower KLD at smaller file sizes when the right tensor-level quant mix is applied. Some models favor MXFP4 in narrow bit ranges where quantization noise becomes beneficial; others reject IQ4_NL entirely or show dramatic KLD drops between adjacent quant levels. MagicQuant flags these anomalies, validates them, and surfaces only configurations that justify their existence—answering "which quant matters for this model?" instead of leaving users to guess whether IQ4_XS or Q4_K_S is the better same-size trade.

The project includes dominance logic (a quant wins if strictly better at the same size), premium logic (a quant wins if a small size increase buys a nonlinear quality jump), and collapse logic that prunes redundant options. Even predictable architectures benefit from optional sub-zones and good collapse spaces. The full pipeline code and per-model benchmark tables are available on GitHub.

More in Releases