Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
A Chinese lab has developed what appears to be one of the most powerful “open source” AI models to date.
model, DeepSeek V3it was developed by AI company DeepSeek and was released on Wednesday under a license that allows developers to download and modify it for many purposes, including commercials.
DeepSeek V3 can handle a wide range of text-based workloads and tasks, such as copying, translating, and writing articles and e-mails from annotated sources.
According to DeepSeek’s benchmark tests, DeepSeek V3 outperforms all existing downloadable, “open” and “closed” AI models that can be accessed through the API. In a coding competition held on Codeforces, a software competition platform, DeepSeek beats other brands, including Meta’s. Call 3.1 405BOpenAI is GPT-4oand Alibaba’s Qwen 2.5 72B.
DeepSeek V3 also breaks the Aider Polyglot competition, a test designed to test, among other things, whether a model can successfully write new code that integrates with existing code.
DeepSeek-V3!
60 tokens/second (3x faster than V2!)
API compatibility is not good
Fully open versions & papers
671B sections of the MOE
37B entered him in parts
Trained on the highest 14.8T tokensIt beats the Llama 3.1 405b in almost every benchmark https://t.co/OiHu17hBSI pic.twitter.com/jVwJU07dqf
– Chubby♨️ (@kimmonismus) December 26, 2024
DeepSeek states that DeepSeek V3 was trained on a dataset of 14.8 trillion tokens. In data science, tokens are used to represent bits of raw data – 1 million tokens is equivalent to about 750,000 words.
It’s not just the education that is plentiful. DeepSeek V3 is huge in size: 671 billion shares, or 685 billion on the AI dev platform Hugging Face. (Parameters and types of internal models used to predict or decisions.) It is around 1.6 times the size of Llama 3.1 405B, which has 405 billion parts.
Parameter estimation is often (but not always) related to skill; models with more parameters tend to outperform models with fewer parameters. But bigger brands also need beefier equipment to make it work. A default version of DeepSeek V3 would require a bank of high-end GPUs to answer queries at reasonable speeds.
Although not the most efficient model, DeepSeek V3 is a success in some ways. DeepSeek was able to train the model using the data center of Nvidia H800 GPUs for about two months – the GPUs that the Chinese company recently had. prohibited and the US Department of Commerce from procurement. The company also claims that it just spent $5.5 million on training DeepSeek V3, a part about the cost of developing models like OpenAI’s GPT-4.
The downside is that the model’s political views are a bit… static. Ask DeepSeek V3 about Tiananmen Square, for example, and it won’t respond.
DeepSeek, being a Chinese company, deserves it benchmarking and China’s internet watchdog to ensure that its solutions “include the core values of society.” Most of them China’s AI system decrease responding to topics that may cause the moderators’ anger, such as speculation about Xi Jinping authority.
DeepSeek, which at the end of November uncovered DeepSeek-R1, the answer to OpenAI’s o1 “conversation” modelit is an interesting organization. It is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses AI to inform its trading decisions.
High-Flyer creates its own server groups to train the model, one of the most recent he says it has 10,000 Nvidia A100 GPUs and costs one billion yen (~$138 million). Founded by Liang Wenfeng, a computer science graduate, High-Flyer aims to achieve “superintelligent” AI through DeepSeek org.
In a interview earlier this year, Wenfeng presented closed AI like OpenAI as a “temporary” moat. “(It) didn’t stop others from working,” he said.
Rightfully so.
TechCrunch has a newsletter focused on AI! Log in here to get it in your inbox every Wednesday.