Thinking Machines builds AI that listens and responds simultaneously
The startup is developing a model that handles simultaneous input and output, mimicking phone-call dynamics instead of turn-based text exchanges.
Thinking Machines is building an AI model that breaks the turn-taking pattern every current system follows. Instead of waiting for a user to finish speaking before processing and responding, the model handles input and output simultaneously—closer to how a phone conversation works than a text thread.
Every major AI model today operates in strict sequence: the user speaks, the model listens, then the model generates a response while the user waits. Thinking Machines' approach aims to collapse that cycle, letting the system process new input even as it's mid-sentence in its own reply.
Rethinking transformer inference
The change requires rethinking how attention and token generation work under the hood. Standard transformer architectures batch input tokens, run inference, then stream output tokens. A simultaneous model would need to interleave those steps without losing coherence or introducing runaway latency.
Thinking Machines announced the project this week but has not disclosed parameter counts, training data, a release timeline, benchmarks, or demo access. Whether the model can maintain context across overlapping streams—and whether users actually want to interrupt an AI mid-thought—remains to be seen.
