Pinductor uses language models to learn world models from partial observations
New method from Atom Research matches state-privileged POMDP learning while using only observation-action trajectories, cutting sample requirements by leveraging language-model knowledge.

Pinductor, a world-model learning system from Atom Research, builds partially-observable Markov decision process (POMDP) models from observation-action trajectories without access to hidden state. The method, detailed in a preprint released this week, uses a large language model to propose candidate POMDP structures from a handful of trajectories, then iteratively refines them against a belief-based likelihood score. In tests across navigation, robotic manipulation, and game environments, Pinductor matched the sample efficiency of prior LLM-based POMDP methods that assume privileged access to hidden state—despite using strictly less information.
The core challenge is that agents acting in real environments must learn internal models of how those environments work, but gathering the data to build such models is expensive. POMDPs are a flexible class for representing world knowledge under partial observability, but traditional tabular methods require extensive interaction to converge. The authors hypothesized that language-model priors could encode enough structural and semantic knowledge to short-circuit the data bottleneck.
Performance and generalization
In the paper's benchmark suite, Pinductor significantly outperformed tabular POMDP baselines on sample efficiency while matching the performance of state-privileged LLM methods. Performance scaled with the capability of the underlying language model: stronger LLMs produced better POMDP proposals. When the authors withheld semantic information about the environment—stripping natural-language descriptions of observations and actions—performance degraded gracefully rather than collapsing, suggesting the method can generalize beyond richly annotated domains.
The system is open-sourced on GitHub. The preprint is available on arXiv. The authors position the work as a step toward generalist agents that can learn world models in real-world environments where hidden state is unavailable and interaction budgets are tight.