Loading…

DynMuon cuts Muon training steps by 10–26% with dynamic spectral shaping | UncensoredHub

ReleasesResearchPlatform

DynMuon cuts Muon training steps by 10–26% with dynamic spectral shaping

A new arXiv preprint proposes DynMuon, which dynamically adjusts the spectral shaping parameter in Muon-style optimizers from positive early in training to mildly negative later, achieving lower validation loss and 10.6–26.5% fewer steps to target loss.

May 18, 2026

DynMuon cuts Muon training steps by 10–26% with dynamic spectral shaping

DynMuon is a new optimizer variant that builds on Muon, the polar-factorization method now standard for training large language models. The preprint, posted to arXiv this week by researchers including Fangzhou Wu, Rikhav Shah, Sandeep Silwal, and Qiuyi Zhang, introduces a "spectral-shaping" framework that replaces the usual Muon update matrix UV^⊤ with UΣ^p V^⊤, where p is a tunable parameter. By scheduling p from positive values early in training to mildly negative values later, DynMuon consistently outperforms fixed Muon across model sizes and architectures.

Muon itself replaces the gradient update M = UΣV^⊤ with its polar factor UV^⊤, effectively discarding singular-value information. DynMuon generalizes this by retaining the singular values Σ but raising them to a power p, allowing the optimizer to emphasize or de-emphasize different eigenspace directions depending on training stage. The paper's theory ties the optimal choice of p to local curvature, stochastic gradient noise, and label noise—three factors that shift as training progresses.

What stands out

01Positive p early, negative p late. The preprint demonstrates that positive p (e.g. p = 0.5) helps early training by emphasizing high-curvature directions and accelerating signal contraction, while mildly negative p (e.g. p = −0.2) helps later by reallocating update strength toward low-curvature directions that still contain useful training signal.
0210.6–26.5% step reduction. Across experiments, DynMuon reaches the same target validation loss as Muon in 10.6–26.5% fewer steps, depending on model size and training setting. The largest gains appear in mid-to-large scale runs.
03Consistent validation-loss improvement. Even when step budgets are held constant, DynMuon achieves lower final validation loss than Muon in every reported configuration, including language model pretraining and transformer training on vision tasks.
04

What stands out

More in Releases