OpenAI gives ChatGPT a voice for verbal conversations

The generative AI engine is also getting image search

OpenAI has just announced a significant expansion of ChatGPT beyond its original role as a text-based search engine. This popular generative AI assistant, which has gained immense popularity in recent months, is now set to become even more interactive. OpenAI is introducing new capabilities that incorporate voice and image-based functionality.

Since its launch approximately nine months ago, ChatGPT has enabled users to generate essays, poems, and summaries by providing simple text-based prompts. However, the latest update will take ChatGPT to the next level, allowing users to engage in voice conversations with the chatbot.

This announcement coincides with Amazon’s commitment to invest up to $4 billion in Anthropic, a rival to OpenAI. This development is part of a broader generative AI competition among tech giants, with Google’s Bard chatbot attempting to catch up, Meta embracing open source principles, and Microsoft aligning closely with OpenAI.

Today marks a significant milestone in the evolution of generative AI, as OpenAI blends the world of voice-based assistants with its powerful large language models (LLMs).

For instance, users will now have the capability to verbally request ChatGPT to spontaneously create a bedtime story, guiding the narrative with a few voice prompts. Alternatively, users can simply ask questions, and ChatGPT will respond in spoken form.

Moreover, ChatGPT users will also gain the ability to search for answers using images. They can upload pictures and ask ChatGPT to provide explanations or instructions related to the image.

The voice feature is powered by a new text-to-speech model that can produce human-like voices from text and a short sample of spoken speech. OpenAI collaborated with established voice actors to develop five distinct voices and used its open-source Whisper speech recognition system to transcribe spoken words into text.

Spotify has been revealed as a launch partner, introducing an innovative feature for podcasters. It enables podcasters to translate their shows from English into Spanish, French, or German while preserving their original voice. However, OpenAI is being cautious with this technology, limiting its availability to select podcasters, including Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons, and Steven Bartlett.

OpenAI acknowledges that while the new voice technology offers numerous creative and accessibility-focused applications, it also poses new risks, such as the potential for impersonation or fraud by malicious actors.

These new features will roll out to paying Plus and Enterprise subscribers in the next two weeks. To activate voice features, users can go to the app’s “settings” menu, navigate to “new features,” and opt in to voice conversations. They can then select their preferred voice by tapping the headphone icon in the top-right corner. Initially, voice functionality will be available on ChatGPT’s Android and iOS apps as an opt-in beta, while image search will be enabled by default across all platforms.