The Rollout
OpenAI has finally rolled out the highly anticipated GPT-4o Voice Mode to a select few alpha subscribers.
We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions. pic.twitter.com/64O94EhhXK
— OpenAI (@OpenAI) July 30, 2024
Yep, you heard it right – the multimodal, human-sounding voice mode is here! It's only available on mobile devices (Android and iOS) for now, so let's dive into what this means and what you can expect.
A Glimpse of the Future: GPT-4o Voice Mode
First things first, I have to come clean – I don't have access to GPT-4o Voice Mode yet. It seems that even some internal folks are struggling to get their hands on it. So, for now, we've got to wait and hope we're among the lucky few chosen for this cutting-edge feature.
But don't lose hope! By fall, this advanced voice feature should be available to all ChatGPT Plus subscribers. Until then, we're starting to see test results and clips from independent third parties who are already putting it through its paces.
How to Know if You Have Access
The official OpenAI Twitter page announced that the long-awaited advanced mode is beginning to roll out to a small group of ChatGPT Plus users. This mode promises more natural, real-time conversations.
If you're one of the chosen few, you'll get a notification on your phone inviting you to try Advanced Mode. When you tap the notification, it will guide you through enabling this feature. Unfortunately, I haven't received this golden ticket yet, but if you have, please reach out to me – I'm dying to test it myself!
Can't wait for GPT-4o's realistic voices? Check out Fliki AI voice generator! It offers over 2,000 realistic voices in 80+ languages and 100+ dialects, along with voice cloning. It’s a great platform for high-quality AI voiceovers.
Real-World Tests: ChatGPT's Voice Capabilities
Now, let's dive into some real-world tests of this new voice mode. Clips are surfacing online, showcasing its capabilities, and it's mind-blowing!
Sports Commentator
One of the tests had ChatGPT acting like a sports commentator, complete with intense intonation and emotion.
New GPT-4o Voice performs with intense intonation and emotion as it imitates a soccer commentator: (Advanced voice model was released to select users last night)pic.twitter.com/PTHH7CqtlW
— AI Breakfast (@AiBreakfast) July 31, 2024
Here's a snippet:
"All right folks, we're in the final minutes of this intense match. The home team is pushing forward, passing with precision, the striker's got the ball, he's weaving through the defense, he shoots, goal! Absolutely unbelievable strike. The crowd goes wild!"
The AI captured the excitement and energy of a live sports event surprisingly well. This natively multimodal voice system is a significant leap from previous versions that used multiple models for text-to-speech.
Vampire Accent
In another test, ChatGPT adopted a Dracula-like accent:
"Ah, greetings mortal. I am here to exchange words under the cover of night."
and to answer the burning question on everyone’s minds: yes, this puppy is fully capable of going dracula mode https://t.co/2BXnMEE27Bpic.twitter.com/h5oTJMMFbH
— benjamin (@ikeadrift) July 30, 2024
The AI's ability to pick up and reproduce accents is impressive. It's almost like having a real conversation with Count Dracula!
Multilingual and Accent Capabilities
ChatGPT's voice mode also excels in language learning and accent coaching. Here's an example of it helping with French pronunciation:
OpenAI's new GPT-4o voice model is rolling out to selected users - here it is in the wild, being used as a language coach pic.twitter.com/h4Db75IigV
— Tsarathustra (@tsarnick) July 30, 2024
"Am I saying this word correctly, croissant?" "Pretty close, try emphasizing the nasal sound at the end a bit more like croissant. How does that feel?" "Croissant." "That's it, you nailed it!"
It's amazing to see how the AI can provide nuanced feedback on pronunciation, making it a fantastic tool for language learners.
Beatboxing and Tongue Twisters
The advanced voice mode even handled beatboxing and tongue twisters with surprising finesse:
Yo ChatGPT Advanced Voice beatboxes pic.twitter.com/yYgXzHRhkS
— Ethan Sutin (@EthanSutin) July 30, 2024
"Birthday rap, no time to nap, light the candles, no scandal, clap, snap, wrap it up, happy birthday, what's up."
It's crazy how lifelike the AI sounds, catching its breath like a human would. The ability to generate nuanced sounds and handle complex speech patterns is truly next-level.
Vision Mode: A Sneak Peek
While the focus has been on voice mode, there's also been a sneak peek into Vision Mode. One user shared a clip of ChatGPT using Vision Mode to help with translations and real-time interactions.
Trying #ChatGPT’s new Advanced Voice Mode that just got released in Alpha. It feels like face-timing a super knowledgeable friend, which in this case was super helpful — reassuring us with our new kitten. It can answer questions in real-time and use the camera as input too! pic.twitter.com/Xx0HCAc4To
— Manuel Sainsily (@ManuVision) July 30, 2024
This feature wasn't supposed to roll out yet, but it's working flawlessly, adding another layer of excitement to what's coming next.
Conclusion
So, what do you think? Does GPT-4o Voice Mode live up to the hype? From sports commentary to language coaching and beatboxing, real-world tests are showcasing its incredible capabilities. While we eagerly await broader access, it's clear that this new voice mode is set to revolutionize our interactions with AI.
I can't wait to get my hands on it and see how it can be used in everyday scenarios. Whether it's helping people with disabilities, diagnosing issues, or simply making conversations more natural, the possibilities are endless.
Stay tuned for more updates, and if you're one of the lucky ones with access, let me know how it's going! Until next time, folks, take care and keep exploring the amazing world of AI!