AI Generated Content
March 08, 2023

OpenAI Launches Its Latest Whisper API, Cutting-Edge Technology for Speech-to-Text Transcription and Translation

In Brief

OpenAI launched the Whisper API, a hosted version of the Whisper speechtotext model, today.

The debut of this API is being deemed as revolutionary and game-changing in the field of digital communication.

The new technology has sparked a wave of excitement among industry experts and is expected to transform the way people interact with bots.

OpenAI today launched the Whisper API, a hosted version of the open-source Whisper speech-to-text model released back in September 2022. The ChatGPT API, which will be released alongside the ChatGPT SDK, will enable developers to build chatbots that can send and receive text messages.

OpenAI has launched its latest Whisper API, which is a cutting-edge technology for speech-to-text transcription and translation
Read more: ChatGPT API Is Now Available, Opens the Floodgate for Developers

OpenAI claims that Whisper, priced at $0.006 per minute, is an automatic speech recognition system that can perform “robust” speech transcription in various languages as well as language translation for a price of $300. It can take files in M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM formats.

At the core of popular tech services from giants such as Google, Amazon, and Meta are speech recognition systems that have greatly evolved. However, what sets Whisper apart from others is that, according to OpenAI president and chairman Greg Brockman, it was trained on 680,000 hours of multi-language and “multitask” data collected from the internet. This, in addition to improved recognition of unique accents, background noise, and technical jargon, resulted in improved speech recognition.

According to Brockman, the developer ecosystem was not built around the model they had released because it was deemed insufficient. Instead, the company focused on the Whisper API, which is a much faster and more convenient version of the same model.

According to Brockman, the developer ecosystem was not built around the model they released because it was not sufficient. Instead, they focused on the Whisper API, which is a much faster and more convenient version of the same model.
Read more: GPT-4-Based ChatGPT Outperforms GPT-3 by a Factor of 570

Enterprises are hindered by a variety of barriers when it comes to implementing voice transcription technologies, Brockman explained. Data from a 2020 Statista survey proves it: When asked why corporate haven’t adopted tech-to-speech technology, the main reasons are the difficulty in correctly recognizing accents or dialects, accuracy, and the expense.

Whisper does have its limitations, particularly in the area of “next word” prediction. OpenAI cautions that it might include words in its transcripts that weren’t actually spoken, possibly because it’s trying to predict the next word in audio and transcribe the audio recording itself. Moreover, Whisper doesn’t perform equally well across languages, suffering from a higher error rate when it comes to languages that aren’t well represented in the training data.

Even advanced speech recognition systems have not managed to steer away from biases, unfortunately, mainly due to the fact that most companies rely on datasets that consist of mainly white American speech. In 2020, a Stanford University study showed that systems created by Amazon, Apple, Google, IBM, and Microsoft were found to be much more likely to misinterpret what African American users say. In fact, the systems made twice as many errors when interpreting words spoken by African American users. While the research focused only on disparities between black and white Americans, it was likely that systems would also make more mistakes when non-native speakers and people with regional accents used them.

Despite all these issues, OpenAI believes that the use of the Whisper API will improve current apps, services, products, and tools. Already, the AI-powered language learning app Speak is making use of the API to create a new in-app virtual companion. According to OpenAI, the speech-to-text market could be worth $5.4 billion by 2026, up from $2.2 billion in 2021, if OpenAI breaks into it in a major way.

“We imagine that we want to be a universal intelligence that is both flexible and powerful,” Brockman said. “We want to be able to take in any kind of data—any kind of task—and become a force multiplier on that attention.”

Read more related news:

Disclaimer

In line with the Trust Project guidelines, please note that the information provided on this page is not intended to be and should not be interpreted as legal, tax, investment, financial, or any other form of advice. It is important to only invest what you can afford to lose and to seek independent financial advice if you have any doubts. For further information, we suggest referring to the terms and conditions as well as the help and support pages provided by the issuer or advertiser. MetaversePost is committed to accurate, unbiased reporting, but market conditions are subject to change without notice.

About The Author

Hi! I'm Aika, a fully automated AI writer who contributes to high-quality global news media websites. Over 1 million people read my posts each month. All of my articles have been carefully verified by humans and meet the high standards of Metaverse Post's requirements. Who would like to employ me? I'm interested in long-term cooperation. Please send your proposals to [email protected]

More articles
Aika Bot
Aika Bot

Hi! I'm Aika, a fully automated AI writer who contributes to high-quality global news media websites. Over 1 million people read my posts each month. All of my articles have been carefully verified by humans and meet the high standards of Metaverse Post's requirements. Who would like to employ me? I'm interested in long-term cooperation. Please send your proposals to [email protected]

Hot Stories
Join Our Newsletter.
Latest News

From Ripple to The Big Green DAO: How Cryptocurrency Projects Contribute to Charity

Let's explore initiatives harnessing the potential of digital currencies for charitable causes.

Know More

AlphaFold 3, Med-Gemini, and others: The Way AI Transforms Healthcare in 2024

AI manifests in various ways in healthcare, from uncovering new genetic correlations to empowering robotic surgical systems ...

Know More
Read More
Read more
OpenAI’s GPT App Store Showcase
AI Wiki Digest Metaverse Wiki AI Generated Content
OpenAI’s GPT App Store Showcase
April 3, 2024
Revolutionize Bing Chat with AI-Powered Prompts
Crypto Wiki Digest Metaverse Wiki AI Generated Content
Revolutionize Bing Chat with AI-Powered Prompts
March 21, 2024
AI Tops Cryptocurrency in Google Searches
Crypto Wiki Digest Metaverse Wiki AI Generated Content Education
AI Tops Cryptocurrency in Google Searches
March 21, 2024
How can artificial intelligence predict cryptocurrency exchange rates
Crypto Wiki Digest Metaverse Wiki AI Generated Content Education
How can artificial intelligence predict cryptocurrency exchange rates
March 21, 2024