xAI's official X account (formerly Twitter), where they shared details about the model's new features in a blog post. While the fundamental features of Grok 1.5 remain unchanged in this updated version, the addition of vision capabilities is expected to expand its capabilities in interacting with the real world. xAI conducted benchmark tests to evaluate Grok 1.5 Vision's performance across various metrics, including their proprietary RealWorldQA benchmark, which assesses the model's understanding of real-world spatial concepts.
Additionally, the model underwent assessments in other tests like MMMU and ChartQA. Notably, in the RealWorldQA test, Grok outperformed OpenAI's GPT-4 with Vision and Google's Gemini 1.5 Pro, although it showed lower performance in other evaluations. Computer vision is an exciting area of computer science that focuses on enabling computers, including AI models, to identify and understand real-world objects through images and videos.
Its goal is to give machines vision capabilities similar to humans. Major tech companies are heavily investing in developing AI models with vision capabilities. Google's Gemini 1.5 Pro and OpenAI's GPT-4 with Vision are prominent competitors in this field.
The potential applications of computer vision are extensive and transformative. For example, Healthify, an Indian platform for calorie tracking and nutrition, recently introduced a feature called 'Snap'. Users can take photos of food items, and the AI suggests healthier recipe adjustments and exercise plans to balance calorie intake.
Read more on livemint.com