OpenAI Begins Rolling Out Her-Like Voice Mode for ChatGPT

atul

By Atul Yadav

Product, Design & Technology

Updated on Aug 16, 2024

The Rollout

OpenAI has finally rolled out the highly anticipated GPT-4o Voice Mode to a select few alpha subscribers.

Yep, you heard it right – the multimodal, human-sounding voice mode is here! It's only available on mobile devices (Android and iOS) for now, so let's dive into what this means and what you can expect.

A Glimpse of the Future: GPT-4o Voice Mode

First things first, I have to come clean – I don't have access to GPT-4o Voice Mode yet. It seems that even some internal folks are struggling to get their hands on it. So, for now, we've got to wait and hope we're among the lucky few chosen for this cutting-edge feature.

But don't lose hope! By fall, this advanced voice feature should be available to all ChatGPT Plus subscribers. Until then, we're starting to see test results and clips from independent third parties who are already putting it through its paces.

How to Know if You Have Access

The official OpenAI Twitter page announced that the long-awaited advanced mode is beginning to roll out to a small group of ChatGPT Plus users. This mode promises more natural, real-time conversations.

If you're one of the chosen few, you'll get a notification on your phone inviting you to try Advanced Mode. When you tap the notification, it will guide you through enabling this feature. Unfortunately, I haven't received this golden ticket yet, but if you have, please reach out to me – I'm dying to test it myself!

Can't wait for GPT-4o's realistic voices? Check out Fliki AI voice generator! It offers over 2,000 realistic voices in 80+ languages and 100+ dialects, along with voice cloning. It’s a great platform for high-quality AI voiceovers.

Real-World Tests: ChatGPT's Voice Capabilities

Now, let's dive into some real-world tests of this new voice mode. Clips are surfacing online, showcasing its capabilities, and it's mind-blowing!

Sports Commentator

One of the tests had ChatGPT acting like a sports commentator, complete with intense intonation and emotion.

Here's a snippet:

"All right folks, we're in the final minutes of this intense match. The home team is pushing forward, passing with precision, the striker's got the ball, he's weaving through the defense, he shoots, goal! Absolutely unbelievable strike. The crowd goes wild!"

The AI captured the excitement and energy of a live sports event surprisingly well. This natively multimodal voice system is a significant leap from previous versions that used multiple models for text-to-speech.

Vampire Accent

In another test, ChatGPT adopted a Dracula-like accent:

"Ah, greetings mortal. I am here to exchange words under the cover of night."

The AI's ability to pick up and reproduce accents is impressive. It's almost like having a real conversation with Count Dracula!

Multilingual and Accent Capabilities

ChatGPT's voice mode also excels in language learning and accent coaching. Here's an example of it helping with French pronunciation:

"Am I saying this word correctly, croissant?" "Pretty close, try emphasizing the nasal sound at the end a bit more like croissant. How does that feel?" "Croissant." "That's it, you nailed it!"

It's amazing to see how the AI can provide nuanced feedback on pronunciation, making it a fantastic tool for language learners.

Beatboxing and Tongue Twisters

The advanced voice mode even handled beatboxing and tongue twisters with surprising finesse:

"Birthday rap, no time to nap, light the candles, no scandal, clap, snap, wrap it up, happy birthday, what's up."

It's crazy how lifelike the AI sounds, catching its breath like a human would. The ability to generate nuanced sounds and handle complex speech patterns is truly next-level.

Vision Mode: A Sneak Peek

While the focus has been on voice mode, there's also been a sneak peek into Vision Mode. One user shared a clip of ChatGPT using Vision Mode to help with translations and real-time interactions.

This feature wasn't supposed to roll out yet, but it's working flawlessly, adding another layer of excitement to what's coming next.

Conclusion

So, what do you think? Does GPT-4o Voice Mode live up to the hype? From sports commentary to language coaching and beatboxing, real-world tests are showcasing its incredible capabilities. While we eagerly await broader access, it's clear that this new voice mode is set to revolutionize our interactions with AI.

I can't wait to get my hands on it and see how it can be used in everyday scenarios. Whether it's helping people with disabilities, diagnosing issues, or simply making conversations more natural, the possibilities are endless.

Stay tuned for more updates, and if you're one of the lucky ones with access, let me know how it's going! Until next time, folks, take care and keep exploring the amazing world of AI!

Stop wasting time, effort and money creating videos

Hours of content you create per month: 4 hours

To save over 96 hours of effort & $4800 per month

No technical skills or software download required.