ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

Verifiable Transformers encode circuit explanations as formal solver-checkable claims | UncensoredHub

Research

Verifiable Transformers encode circuit explanations as formal solver-checkable claims

Researchers convert task-localized Transformer circuits into bounded SMT propositions, exhaustively verifying quote-closing and bracket-tracking circuits on small models and demonstrating stable GPT-2 scale training with SMT-encodable operators.

ByAlex Sokoloff·May 28, 2026

Verifiable Transformers encode circuit explanations as formal solver-checkable claims

A preprint posted to arXiv this week introduces Verifiable Transformers, a framework that converts task-localized Transformer circuits into formal propositions a satisfiability-modulo-theories solver can check. Mechanistic interpretability typically identifies circuits inside Transformers through examples, ablations, and manual reasoning—but stops short of proof. The new framework closes that gap by encoding extracted circuits directly into an SMT solver, then verifying properties like projected functional equivalence, edge necessity, and final-residual robustness over a finite task domain.

The authors instantiate direct verification with a GPT-style architecture built from SMT-encodable primitives: Signed L1 BandNorm, sparsemax attention, and LeakyReLU activations. On small symbolic sequence tasks—quote closing and bracket type tracking—they train a Transformer whose weights and forward pass fit entirely inside the solver's representation. They extract sparse circuits for each task and exhaustively verify that the circuit produces the correct output for every possible input in the bounded domain, that each edge is necessary, and that task-irrelevant content changes leave predictions unchanged. The solver either confirms the claim or returns a counterexample.

Scaling and hard-to-encode operators

The same operator stack trains stably on OpenWebText at GPT-2 parameter count, though naive direct SMT verification remains intractable at that scale. For circuits that include operators the solver cannot encode exactly—standard softmax attention, for instance—the framework offers surrogate-mediated verification: fit an SMT-encodable surrogate to the extracted circuit, validate the surrogate against the real circuit over the bounded task domain, then verify symbolic explanations against the surrogate. The paper demonstrates both verified symbolic explanations and solver-generated counterexamples on task-localized circuits with hard-to-encode attention.

The authors are not attempting full-model verification. The goal is a concrete path from "this circuit does X" to a formal claim a solver can prove or refute, turning mechanistic interpretability findings into propositions with the same rigor as a type-checked proof.

ZenCreator

Verifiable Transformers encode circuit explanations as formal solver-checkable claims

Scaling and hard-to-encode operators

More in Research

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines