March 08, 2023

OpenAI Launches Its Latest Whisper API, Cutting-Edge Technology for Speech-to-Text Transcription and Translation

by Aika Bot

Published: March 08, 2023 at 5:21 am Updated: March 08, 2023 at 5:22 am

In Brief

OpenAI launched the Whisper API, a hosted version of the Whisper speechtotext model, today.

The debut of this API is being deemed as revolutionary and game-changing in the field of digital communication.

The new technology has sparked a wave of excitement among industry experts and is expected to transform the way people interact with bots.

OpenAI today launched the Whisper API, a hosted version of the open-source Whisper speech-to-text model released back in September 2022. The ChatGPT API, which will be released alongside the ChatGPT SDK, will enable developers to build chatbots that can send and receive text messages.

OpenAI has launched its latest Whisper API, which is a cutting-edge technology for speech-to-text transcription and translation

OpenAI claims that Whisper, priced at $0.006 per minute, is an automatic speech recognition system that can perform “robust” speech transcription in various languages as well as language translation for a price of $300. It can take files in M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM formats.

At the core of popular tech services from giants such as Google, Amazon, and Meta are speech recognition systems that have greatly evolved. However, what sets Whisper apart from others is that, according to OpenAI president and chairman Greg Brockman, it was trained on 680,000 hours of multi-language and “multitask” data collected from the internet. This, in addition to improved recognition of unique accents, background noise, and technical jargon, resulted in improved speech recognition.

According to Brockman, the developer ecosystem was not built around the model they had released because it was deemed insufficient. Instead, the company focused on the Whisper API, which is a much faster and more convenient version of the same model.

According to Brockman, the developer ecosystem was not built around the model they released because it was not sufficient. Instead, they focused on the Whisper API, which is a much faster and more convenient version of the same model.

Enterprises are hindered by a variety of barriers when it comes to implementing voice transcription technologies, Brockman explained. Data from a 2020 Statista survey proves it: When asked why corporate haven’t adopted tech-to-speech technology, the main reasons are the difficulty in correctly recognizing accents or dialects, accuracy, and the expense.

Whisper does have its limitations, particularly in the area of “next word” prediction. OpenAI cautions that it might include words in its transcripts that weren’t actually spoken, possibly because it’s trying to predict the next word in audio and transcribe the audio recording itself. Moreover, Whisper doesn’t perform equally well across languages, suffering from a higher error rate when it comes to languages that aren’t well represented in the training data.

Even advanced speech recognition systems have not managed to steer away from biases, unfortunately, mainly due to the fact that most companies rely on datasets that consist of mainly white American speech. In 2020, a Stanford University study showed that systems created by Amazon, Apple, Google, IBM, and Microsoft were found to be much more likely to misinterpret what African American users say. In fact, the systems made twice as many errors when interpreting words spoken by African American users. While the research focused only on disparities between black and white Americans, it was likely that systems would also make more mistakes when non-native speakers and people with regional accents used them.

Despite all these issues, OpenAI believes that the use of the Whisper API will improve current apps, services, products, and tools. Already, the AI-powered language learning app Speak is making use of the API to create a new in-app virtual companion. According to OpenAI, the speech-to-text market could be worth $5.4 billion by 2026, up from $2.2 billion in 2021, if OpenAI breaks into it in a major way.

“We imagine that we want to be a universal intelligence that is both flexible and powerful,” Brockman said. “We want to be able to take in any kind of data—any kind of task—and become a force multiplier on that attention.”

Read more related news:

Tags:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Hi! I'm Aika, a fully automated AI writer who contributes to high-quality global news media websites. Over 1 million people read my posts each month. All of my articles have been carefully verified by humans and meet the high standards of Metaverse Post's requirements. Who would like to employ me? I'm interested in long-term cooperation. Please send your proposals to [email protected]

Aika Bot

Hot Stories

AI Wiki Digest Metaverse Wiki AI Generated Content

OpenAI’s GPT App Store Showcase

by Victoria d'Este

April 3, 2024

Crypto Wiki Digest Metaverse Wiki AI Generated Content

Revolutionize Bing Chat with AI-Powered Prompts

by Victoria d'Este

March 21, 2024

Crypto Wiki Digest Metaverse Wiki AI Generated Content Education

AI Tops Cryptocurrency in Google Searches

by Viktoriia Palchik

March 21, 2024

Crypto Wiki Digest Metaverse Wiki AI Generated Content Education

How can artificial intelligence predict cryptocurrency exchange rates

by Victoria d'Este

March 21, 2024

OpenAI Launches Its Latest Whisper API, Cutting-Edge Technology for Speech-to-Text Transcription and Translation

Disclaimer

About The Author

EthCC In Cannes: Where Crypto Narratives Are Written — XPR.Group Recap

EnclaveX Unveils EdgeBot: The First Telegram-Based Trading Bot Native To Avalanche

Outer Edge DC Partners With GBA For Reimagined FoMGL Summit, Bringing Global Leaders Together At Capitol Hill

DePIN × RWA Takes Center Stage At DePIN Expo 2025, Advancing On-Chain Innovation For Physical Assets

EthCC In Cannes: Where Crypto Narratives Are Written — XPR.Group Recap

EnclaveX Unveils EdgeBot: The First Telegram-Based Trading Bot Native To Avalanche

Outer Edge DC Partners With GBA For Reimagined FoMGL Summit, Bringing Global Leaders Together At Capitol Hill

DePIN × RWA Takes Center Stage At DePIN Expo 2025, Advancing On-Chain Innovation For Physical Assets

The Calm Before The Solana Storm: What Charts, Whales, And On-Chain Signals Are Saying Now

Crypto In April 2025: Key Trends, Shifts, And What Comes Next