Claude Sonnet-4.6 accuracy jumps to 56.9% with metacognitive test-time control
A new arXiv preprint shows large language models can monitor their own reasoning before and after solving problems, and turning those signals into explicit test-time control raises Claude Sonnet-4.6's pooled accuracy from 48.3% to 56.9% without parameter updates.
Large language models already know when they're likely to succeed or fail at a problem—they just don't act on that knowledge during inference. A preprint posted to arXiv on May 15 demonstrates that separating self-monitoring from reasoning can unlock substantial test-time gains. The approach, called a metacognitive harness, lifts Claude Sonnet-4.6's pooled accuracy from 48.3 to 56.9 percent across text, code, and multimodal benchmarks without touching the model's weights or fine-tuning on specific tasks.
The harness is built around two signals borrowed from cognitive psychology: a pre-solve feeling-of-knowing (FOK) score and a post-solve judgment-of-learning (JOL) score. Before attempting a problem, the model reports how confident it is that it can solve it. After generating an answer, it reports how confident it is that the answer is correct. Instead of treating those scores as passive confidence estimates, the harness uses them to decide whether to trust the current solution, retry with compact metacognitive feedback, or pass multiple attempts to a final aggregator.
On benchmarks
The preprint reports results on three primary evaluation settings: HLE-Verified, LiveCodeBench v6, and R-Bench-V. In each case, the metacognitive harness exceeded the strongest listed leaderboard entries at the time of evaluation. The gains come entirely from test-time control—no parameter updates, no benchmark-specific fine-tuning, no external retrieval. The model's base reasoning ability remains fixed; only the decision logic around when to retry and when to aggregate changes.
The work suggests that frontier models may already possess latent metacognitive ability that goes unused during standard inference. The harness architecture is model-agnostic, and the preprint notes that the approach could apply to any LLM that exposes useful self-monitoring signals.
