ChatGPT creator OpenAI announced the launch of GPT-4o, its newest flagship model, during a livestream on Monday. Chief technology officer Mira Murati described GPT-4o as «much faster» and «improves capabilities across text, vision, and audio». Also, the model will be accessible to all users.
What does the ‘o’ signify in GPT-4o?
According to OpenAI, the 'o' represents omni, indicating that GPT-4o marks a significant step towards more natural human-computer interactions. It can accept inputs in any combination of text, audio, and images, generating corresponding outputs in these formats.
What features does OpenAI introduce with GPT-4o?
OpenAI CEO Sam Altman highlighted in a post on X that GPT-4o is «natively multimodal,» meaning it can comprehend commands or produce content in voice, text, or image formats.
The updated model can mimic human-like verbal responses, allowing real-time interactions, diverse voice generation, harmonising them, and providing instant translation. It can process audio inputs in just 232 milliseconds, with an average of 320 milliseconds—comparable to human response times in conversations.
Moreover, GPT-4o extends beyond text-based communication with its vision capabilities, enabling analysis and discussions about visual content