Gemma3-27B hits 700k activation peaks; study maps quantization constraints across 27 open LLMs
A new arXiv preprint measures maximum activation magnitudes across 27 open-weight checkpoints, revealing that Gemma3-27B-it hits ~700,000 while Qwen3.5 and MoE models stay in the hundreds—a range that directly constrains low-bit quantization choices.
A preprint released this week tackles a question most open-weight releases ignore: how large do activations actually get inside the model, and why does it matter for deployment? Researchers measured maximum activation magnitudes across 27 checkpoints from 8 open families—Qwen, Gemma, LLaMA, Mistral, and others—using a unified 5,000-sample corpus and identical hooks at every layer. They found that global maxima span nearly four orders of magnitude at comparable parameter counts. Gemma3-27B-it reaches roughly 700,000, while Qwen3.5 and MoE checkpoints stay in the hundreds to low thousands. That range matters because activation magnitude is a first-order constraint for INT8 quantization, activation scaling, and stable inference—yet most model cards never report it.
The study breaks the assumption that bigger models automatically mean bigger activations. Cross-family and cross-generation comparisons show no simple monotonic scaling. MoE checkpoints exhibit 14.0–23.4× lower peaks than matched-scale dense counterparts, and in 22 of 24 checkpoints the residual stream—not attention or MLP blocks—carries the global maximum. The authors ran a lightweight INT8 sanity check and confirmed that measured maxima co-vary with low-bit reconstruction error when you pick activation scales based on those peaks. In other words, if you quantize a model without knowing its actual activation range, you're guessing.
Measurement pipeline and implications
Prior work characterized outlier features on pre-2024 LLaMA-style models, and the downstream quantization stack inherited that picture without revisiting it for the post-LLaMA open-model boom. The new study argues that maximum activation magnitude is a model property tied to family, architecture, and training stage—not a simple byproduct of size—and should be measured and reported alongside any open-weight release before low-bit deployment. The code and measurement pipeline are publicly available on GitHub (clx1415926/Max_act_llm), so practitioners can run the same checks on new releases. The preprint is available at arXiv:2605.15572.
