OpenAI announced in May 2026 the addition of three new voice intelligence features to its Realtime API, giving developers tools to build applications capable of conversing, transcribing, and translating speech in real time.
The centerpiece of the release is GPT-Realtime-2, an upgraded voice model that replaces GPT-Realtime-1.5. The new model is built on GPT-5-class reasoning, which OpenAI says enables it to handle more complex user requests than its predecessor. It is billed by token consumption.
Alongside it, OpenAI is launching GPT-Realtime-Translate, a real-time translation service designed to keep pace with conversational speech. The feature supports more than 70 input languages — those it can understand — and 13 output languages, the languages it speaks back to users. The company is also releasing GPT-Realtime-Whisper, a live speech-to-text tool that captures transcriptions as conversations unfold. Both Translate and Whisper are billed by the minute.
“Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI said.
OpenAI identified customer service as an obvious use case for the new features, but also pointed to education, media, events, and creator platforms as areas that could benefit.
The company acknowledged the potential for misuse and said it has built guardrails into the system to prevent the tools from being used to generate spam, fraud, or other harmful content. Certain triggers have been embedded so that conversations can be halted if they are detected as violating OpenAI’s harmful content guidelines.
All three voice models are available through OpenAI’s Realtime API.
Source: TechCrunch