Null-calibration framework dismantles Platonic Representation Hypothesis
A permutation-based statistical method shows that global cross-modal convergence in large AI models vanishes when embedding dimensionality and layer depth are properly controlled, replacing the Platonic hypothesis with a local-topology alternative.

A team from EPFL has released a permutation-based null-calibration framework that strips representation similarity metrics of two confounding factors: embedding dimensionality and layer-search depth. The method, detailed in a February 2025 arXiv preprint, converts raw similarity scores into calibrated effect sizes with statistical guarantees. When applied to popular cross-modal models, the framework reveals that the widely cited Platonic Representation Hypothesis—the claim that large models converge toward a single shared geometry across modalities—collapses under proper statistical control.
The Platonic Representation Hypothesis has been a cornerstone of recent multimodal AI theory, suggesting that vision, language, and audio encoders trained at scale all converge toward the same underlying representational structure. Authors Fabian Gröger, Shuo Wen, and Maria Brbić demonstrate that this apparent convergence is largely an artifact of measurement. Raw similarity metrics naturally inflate when comparing models with wider embeddings or deeper layer stacks, producing spurious correlations that disappear once those factors are calibrated out. The null-calibration approach uses permutation tests to establish a baseline, then rescales observed similarity into an effect size that accounts for model architecture. Comparing embeddings from a 12-layer vision transformer to a 24-layer language model without calibration can yield misleadingly high similarity scores that evaporate under the new framework.
In place of the Platonic view, the authors propose an Aristotelian Representation Hypothesis: models trained at scale do converge, but only in local topological neighborhoods around individual data points, not in a global geometric sense. The framework is metric-agnostic and works with any representation similarity measure, from centered kernel alignment (CKA) to linear probes. Code is available on GitHub at mlbio-epfl/aristotelian, and the paper was posted to arXiv on February 24, 2025.



