Loading…

Unsloth Q4_K_XL edges Mudler Apex MoE on Qwen3.5 122B math benchmark | UncensoredHub

Industry

Unsloth Q4_K_XL edges Mudler Apex MoE on Qwen3.5 122B math benchmark

A LocalLLaMA user's direct test of Qwen3.5 122B quantizations shows Unsloth's Q4_K_XL posting slightly higher GSM8K accuracy than Mudler's newer Apex MoE IQuality variant, though both deliver similar real-world output quality at comparable file sizes.

May 18, 2026

Unsloth Q4_K_XL edges Mudler Apex MoE on Qwen3.5 122B math benchmark

Unsloth has held the top spot among quantization publishers for many practitioners running local models, thanks to same-day releases after model drops, consistently low perplexity scores, and a library of tutorials on their site. But a recent comparison between Unsloth's standard quants and Mudler's Apex MoE technique has users running their own benchmarks to see if the newer approach closes the gap.

One practitioner tested Qwen3.5 122B in two roughly equivalent sizes: Unsloth's Q4_K_XL and Mudler's IQuality Apex MoE quant. Both clocked in at similar file sizes, making them a fair comparison for memory-constrained setups. In casual prompting, the user reported no noticeable difference in output quality. A single-run GSM8K benchmark gave Unsloth a slight edge, though the margin was narrow enough that real-world task performance felt indistinguishable.

On benchmark margins

The GSM8K result aligns with Unsloth's reputation for squeezing out the lowest perplexity numbers in the quantization scene. Perplexity measures how well a compressed model predicts the next token compared to the full-precision original; lower is better. Unsloth's Q4_K_XL format has historically posted single-digit percentage drops from FP16 baselines, a key reason practitioners reach for it first when a new 70B or 122B model lands. Mudler's Apex MoE quants, which debuted in late 2025, use a mixture-of-experts-inspired sparsity pattern to reduce file size without the usual quality hit, but the technique is still being refined across different model architectures.

The debate over quantization publishers draws dozens of voices, with most naming Unsloth for speed-to-market and documentation, and a vocal minority vouching for Mudler's Apex MoE when VRAM is tight and a few perplexity points don't matter. The Qwen3.5 122B test suggests that for math-heavy benchmarks like GSM8K, Unsloth's traditional quantization still holds a small accuracy advantage, but the gap is narrow enough that users prioritizing inference speed or lower memory footprint may find Apex MoE a worthwhile trade-off.

On benchmark margins

More in Industry