AI model training costs disclosed by China's DeepSeek innovator, causing a stir in the technology scene.
In a groundbreaking development, Chinese AI developer DeepSeek has made public the details of its training costs and the chips used in the creation of its reasoning-focused AI model, the R1, in a peer-reviewed article published in Nature in June 2023.
The article revealed that DeepSeek initially used Nvidia's A100 chips during the preparatory phase of developing the R1 model. However, for the 80 hours of training, the company switched to a 512-chip cluster of Nvidia's H800 chips, a version designed specifically for the Chinese market due to US export restrictions on its more powerful H100 and A100 AI chips.
DeepSeek has consistently defended the use of a technique known as distillation, where one AI system learns from another, as yielding better model performance while being far cheaper. This approach enabled the company to significantly reduce its training costs compared to US rivals, with DeepSeek spending $294,000 on training its R1 model, a fraction of the reported costs for similar models developed by US companies.
The company's use of distillation has been a subject of debate, with DeepSeek not directly responding to assertions that it had deliberately "distilled" OpenAI's models into its own. OpenAI, for its part, has not commented on the matter immediately. It is worth noting that DeepSeek has used Meta's open-source Llama AI model for some distilled versions of its own models.
Interestingly, DeepSeek's V3 model training data relied on crawled web pages containing a significant number of OpenAI-model-generated answers, suggesting an indirect acquisition of knowledge. However, DeepSeek has clarified that this was not an intentional but incidental occurrence.
The revelation of DeepSeek's lower-cost AI systems in January 2023 prompted global investors to sell tech stocks, causing a ripple effect in the tech industry. US officials have also questioned some of DeepSeek's statements about its development costs and the technology it used.
Despite the controversy, DeepSeek has remained steadfast in its commitment to making AI-powered technologies more accessible and affordable. The company acknowledged owning A100 chips for the first time in a supplementary information document, further adding to the transparency of its operations.
As the AI landscape continues to evolve, the revelations about DeepSeek's R1 model and its training methods are set to shape the ongoing debate about the role of distillation in AI development and the potential for lower-cost AI systems to disrupt the global tech industry.
Read also:
- Humorous escapade on holiday with Guido Cantz:
- Expands Presence in Singapore to Amplify Global Influence (Felicity)
- Amazon customer duped over Nvidia RTX 5070 Ti purchase: shipped item replaced with suspicious white powder; PC hardware fan deceived, discovers salt instead of GPU core days after receiving defective RTX 5090.
- Detailed explanations of the steps carried out will be presented by the Commission.