large language model, DeepSeek V3. The model, boasting 671 billion parameters, outperformed prominent AI models like Meta’s Llama 3.1 and OpenAI’s GPT-4o in benchmark tests evaluating text understanding, coding, and problem-solving. This achievement is a major step for China's AI industry.
ET Year-end Special Reads
Take That: The gamechanger weapon's India acquired in 2024
10 big-bang policy moves Modi government made in 2024
How governments tried to rein in the social media beast
The Hangzhou-based company revealed in a WeChat post that DeepSeek V3 was developed at an impressive cost of just $5.58 million, utilizing only 2.78 million GPU hours. By contrast, Meta’s Llama 3.1 required 30.8 million GPU hours. DeepSeek relied on Nvidia's H800 GPUs, tailored for the Chinese market, sidestepping US sanctions that block access to advanced chips.
Also Read: After 6th gen stealth fighter jet, China now unveils world's largest amphibious ship in a big challenge to U.S; here are specifications and all details
Computer scientist Andrej Karpathy praised the achievement on X (formerly Twitter), noting that DeepSeek managed to create a frontier-grade model with minimal resources. According to DeepSeek’s technical report, the V3 model not only surpassed Meta’s and Alibaba’s models but also delivered results comparable to OpenAI’s GPT-4o and Amazon-backed Anthropic’s Claude 3.5 Sonnet.
DeepSeek, spun off in 2022 from High-Flyer Quant, emphasizes cost-effective AI development. The company’s Fire Flyer GPU clusters have been