ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

Shannon-Hartley theorem unifies LLM scaling, overfitting, and quantization collapse | UncensoredHub

Research

Shannon-Hartley theorem unifies LLM scaling, overfitting, and quantization collapse

New arXiv preprint models neural network capacity using Shannon-Hartley theorem, unifying monotonic scaling with catastrophic overfitting and quantization collapse under a single information-theoretic framework.

ByAlex Sokoloff·May 28, 2026

Shannon-Hartley theorem unifies LLM scaling, overfitting, and quantization collapse

A team led by Xu Ouyang has published a preprint that reframes large language model scaling through the lens of classical information theory. The paper, "LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws," treats the training process as signal transmission over a noisy channel—parameters become bandwidth, training tokens become signal power, and the Shannon-Hartley capacity theorem sets hard limits on what any model can learn.

The Shannon Scaling Law unifies two behaviors that standard power-law scaling cannot explain: the U-shaped loss curve that appears when a model is overtrained on finite data, and the capacity collapse that follows aggressive quantization. Both are emergent consequences of cumulative noise—data noise, inter-component interference, and architectural constraints—that the new framework models explicitly. The authors demonstrate predictive accuracy across multiple model families and training regimes, including scenarios where classical Chinchilla-style laws break down.

What stands out

01Hard capacity ceiling. The Shannon bound gives a finite information ceiling for any parameter-token budget. Beyond that ceiling, additional pretraining or lower-bit quantization destroys capacity rather than preserving it. Practitioners gain a mathematical stopping rule instead of guessing when overfitting begins.
02Unified view of non-monotonic loss. Catastrophic overfitting and quantization degradation both emerge from the same noise term in the capacity equation. The paper shows that a 4-bit quantized 70B model and a 13B model overtrained for 3× the optimal token count hit the same information bottleneck, just through different noise sources.
03Resource allocation math. The framework lets teams calculate the marginal information gain from an extra trillion tokens versus an extra 10 billion parameters. When the noise floor rises faster than the signal, throwing more compute at the problem yields negative returns—a prediction classical scaling laws miss entirely.

ZenCreator

Shannon-Hartley theorem unifies LLM scaling, overfitting, and quantization collapse

What stands out

More in Research

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines