ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms)." “They (voice and image capabilities) offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about." the Sam Altman-led company said in a subsequent blog post. ChatGPT will now be able to answer users' questions in five different voices, which can be selected according to user preferences.
OpenAI says it has enlisted the help of professional voice actors to create each voice, while also using the company's proprietary Whisper speech recognition system to transcribe spoken words into text. ChatGPT's new voice capabilities are powered by a new text-to-speech model that OpenAI claims is capable of generating human-like audio from just text and a few seconds of speech samples, opening the door to many "creative and accessibility-focused applications". OpenAI is also working with other companies to harness the power of this new technology.
Spotify has also partnered with the AI startup to translate podcasts into additional languages in the podcaster's own voice. OpenAI is using the multimodal abilities of GPT-3.5 and GPT-4 in order to power the Image understanding of ChatGPT. Users can now upload one or more images to ask ChatGPT questions like explore the contents of my fridge to plan a meal, or analyze a complex graph for work-related data.
Read more on livemint.com