News Report Technology
September 01, 2025

OpenAI Unveils GPT-Realtime Speech-To-Speech Model With Multimodal Support And Advanced Conversational Capabilities

In Brief

OpenAI released the gpt-realtime speech-to-speech model with multimodal support, advanced conversational skills, and strong audio reasoning performance.

OpenAI Unveils GPT-Realtime Speech-To-Speech Model With Multimodal Support And Advanced Conversational Capabilities

Artificial intelligence research organisation OpenAI announced the general availability of its Realtime API, now enhanced with features that allow developers and enterprises to build robust, production-ready voice agents. The API supports remote MCP servers, image inputs, and phone calling via Session Initiation Protocol (SIP), enabling more capable and context-aware voice applications.

Alongside the API, OpenAI has released its most advanced speech-to-speech model, gpt-realtime, designed to improve instruction following, function calling, and natural-sounding speech. The model can interpret complex prompts, switch languages mid-sentence, reproduce alphanumeric sequences accurately, and capture non-verbal cues. Two new voices, Cedar and Marin, are also available, offering more expressive and human-like intonation. Existing voices have been updated to incorporate these enhancements.

The Realtime API processes audio directly through a single model, reducing latency and preserving nuance, unlike traditional pipelines that chain separate speech-to-text and text-to-speech models. gpt-realtime has been trained in collaboration with users to excel in real-world applications such as customer support, personal assistance, and education. Benchmark evaluations show substantial improvements in reasoning, instruction adherence, and function calling accuracy compared to previous models.

Additional updates include asynchronous function calling, allowing long-running operations without interrupting ongoing conversations, further supporting seamless, production-ready voice experiences.

OpenAI Expands Realtime API With MCP Support, Image Inputs, SIP Integration, And Cost-Saving Controls For Voice Agents

OpenAI’s Realtime API now includes new features designed to simplify integration and expand capabilities for production-ready voice agents. Developers can enable remote MCP support by linking a session to an MCP server URL, allowing the API to manage tool calls automatically and access additional functionalities without manual setup.

The gpt-realtime model now supports image inputs, enabling the system to incorporate photos, screenshots, and other visuals alongside audio or text. This allows users to ask context-specific questions about what they see, while developers retain control over which images are shared and when.

Additional improvements include Session Initiation Protocol (SIP) support for connecting apps to phone networks and PBX systems, as well as reusable prompts that let developers save and deploy pre-configured instructions, tools, and example messages across multiple sessions.

The generally available Realtime API and gpt-realtime model are now accessible to all developers, with pricing reduced by 20% compared to the previous gpt-4o-realtime-preview. New controls for conversation context allow for smarter token management, reducing costs for long-running sessions. Documentation, a Playground for testing, and a Realtime API prompting guide are available to support developers in adopting these features.

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

More articles
Alisa Davidson
Alisa Davidson

Alisa, a dedicated journalist at the MPost, specializes in cryptocurrency, zero-knowledge proofs, investments, and the expansive realm of Web3. With a keen eye for emerging trends and technologies, she delivers comprehensive coverage to inform and engage readers in the ever-evolving landscape of digital finance.

The Calm Before The Solana Storm: What Charts, Whales, And On-Chain Signals Are Saying Now

Solana has demonstrated strong performance, driven by increasing adoption, institutional interest, and key partnerships, while facing potential ...

Know More

Crypto In April 2025: Key Trends, Shifts, And What Comes Next

In April 2025, the crypto space focused on strengthening core infrastructure, with Ethereum preparing for the Pectra ...

Know More
Read More
Read more
Microsoft Debuts First In-House AI Models: MAI-Voice-1 For Ultra-Fast Speech And MAI-1-Preview For Instruction-Following Tasks
News Report Technology
Microsoft Debuts First In-House AI Models: MAI-Voice-1 For Ultra-Fast Speech And MAI-1-Preview For Instruction-Following Tasks
September 1, 2025
1inch Partners With Barter To Enhance Resolver Network And Optimize Intent-Based DeFi Trading
Business News Report Technology
1inch Partners With Barter To Enhance Resolver Network And Optimize Intent-Based DeFi Trading
September 1, 2025
CryptoQuant: Despite Current Correction, Bitcoin Retains Growth Potential If Demand Persists
News Report Technology
CryptoQuant: Despite Current Correction, Bitcoin Retains Growth Potential If Demand Persists
September 1, 2025
Ethereum Foundation Updates Ecosystem Support Program To Focus On Strategic Initiatives And Proactive Growth
News Report Technology
Ethereum Foundation Updates Ecosystem Support Program To Focus On Strategic Initiatives And Proactive Growth
September 1, 2025