Thinking Machines Lab announced in May 2026 a new approach to AI conversation it calls “interaction models” — systems capable of processing user input and generating a response simultaneously, rather than waiting for a speaker to finish before replying.
The company, founded by former OpenAI CTO Mira Murati, says its model, TML-Interaction-Small, responds in 0.40 seconds — roughly the speed of natural human conversation and, the company claims, significantly faster than comparable models from OpenAI and Google. The technical term for this simultaneous send-and-receive capability is “full duplex,” a design the company argues should be built natively into a model rather than added on afterward.
The announcement is a research preview, not a public product launch. Thinking Machines says a limited research preview is expected within the next few months, with a broader release planned for later in 2026.
Current AI voice and chat systems operate in a back-and-forth pattern: the user speaks, the model listens, then the model responds while the user waits. Thinking Machines is positioning its approach as closer to a phone call than a text exchange — one where the model can, in effect, be interrupted mid-response.
Whether the real-world experience matches the technical claims remains to be seen, as the model is not yet publicly available. The benchmarks the company has released are notable, and the underlying premise — that interactivity should be a core model capability — may have implications for how conversational AI is built going forward. But that will depend on how the system performs once users can actually test it.
Source: TechCrunch