OpenAI gives its voice agent superpowers to developers – look for more apps soon

OpenAI revenue doubles to $12B...here's why its subscription is working for them

Elyse Betters Picaro / ZDNET

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways

  • OpenAI’s Realtime API is now optimized and generally available.
  • You can try its latest speech-to-speech model, got-realtime.
  • The upgrades improve OpenAI’s voice offerings for developers. 

This year, AI agents that can carry out tasks on behalf of users have been a major focus, with companies constantly developing offerings that reduce the user’s workload. To make these interactions as seamless as possible, many companies are leaning on multimodal AI agents, and OpenAI is making developing these products even easier. 

According to the company, OpenAI updated its generally available Realtime API on Thursday to include more features that allow developers and enterprises to build more reliable voice agents. Additionally, the company released its most advanced speech-to-speech model yet: gpt-realtime. 

The releases: 

RealTime API updates

  • What: The upgrades to the Realtime API include support for remote MCP servers, image inputs, and phone calling through Session Initiation Protocol (SIP), according to the release.
  • Why it matters: Ultimately, these expanded capabilities should enable voice agents to access more tools and have more context to assist users. AI tools are only as helpful as the information they give, so streamlining the process of connecting AI models to data sources is a big win for developers and users alike. Most importantly, the MCP open-standard ensures that the connections are made, prioritizing user data and privacy. 

A new speech-to-speech model

  • What: OpenAI touted its new gpt-realtime model as the company’s “most advanced, production-ready voice model.” Upgrades include improvements in intelligence, instruction following, and function calling, according to the release. 
  • Why it matters: A key tenet of helpful voice assistance and interactions is models that sound natural and have the ability to actually help with tasks. If the new model works as claimed, it will enable a better experience for users. 

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top