China’s Breakthrough in AI: The Rise of Deep Seek
- February 2, 2025
- Posted by: OptimizeIAS Team
- Category: DPN Topics
No Comments
China’s Breakthrough in AI: The Rise of Deep Seek
Sub : Sci
Sec: Awareness in IT
Why in News
- China has made significant advancements in Artificial Intelligence (AI) with the launch of DeepSeek, a series of Large Language Models (LLMs) developed by High-Flyer, a Chinese hedge fund.
About DeepSeek:
- In April 2023, High-Flyer established its AI lab, DeepSeek, to develop LLMs.
- In December 2023, DeepSeek-v3 launched, surpassing its predecessor and outperforming existing Chinese AI models at a lower cost.
- Subsequently, DeepSeek-R1 was introduced, matching OpenAI’s O1 model on multiple metrics while being significantly cost-efficient.
- DeepSeek revolutionized AI with its Mixture-of-Experts (MoE) model, DeepSeek-v3.
- Key Features of DeepSeek-v3:
- Pre-trained on 14.8 trillion tokens
- 671 billion parameters, with 37 billion activated per token
- Trained using Nvidia’s H800 GPUs, requiring 2.78 million GPU hours
- Outperformed Meta’s Llama 3.1, which required 30.8 million more GPU hours
- Key Advantages Over OpenAI’s O1:
- 93% cost reduction per API call
- Capable of running on high-end local computers, reducing dependence on cloud services
- Optimized memory efficiency, requiring minimal RAM
- Uses batch processing and cloud computing for cost efficiency
- These features make DeepSeek-R1 an attractive option for businesses and developers looking for affordable AI solutions.
- DeepSeek-R1 excels in data analysis but remains restricted by China’s censorship policies.
- Despite censorship, DeepSeek-R1 became the most downloaded app in the U.S. and ranks third in India’s productivity category.
China’s Previous AI Efforts:
- In March 2023, Baidu launched Ernie Bot, marketed as China’s answer to ChatGPT, garnering 30 million user sign-ups in a single day. However, Ernie Bot faced criticism for dodging politically sensitive questions about:
- President Xi Jinping
- Tiananmen Square crackdown
- Human rights issues concerning Uyghur Muslims
- Ernie’s failure highlighted the challenges of AI censorship in China and led to scepticism about the country’s ability to develop competitive AI models.
About Distillation Process:
- Knowledge distillation is a machine learning technique that involves transferring knowledge from a large, complex model (referred to as the “teacher”) to a smaller, more efficient model (known as the “student”).
- Reduces the size of the model, making it more suitable for deployment on devices with limited computational resources.
- Enhances inference speed and reduces latency, which is crucial for real-time applications.
- Decreases the computational power and memory required for model training and deployment.
- Techniques in Knowledge Distillation:
- Logit Matching: Aligning the output logits (pre-activation values) of the student model with those of the teacher model.
- Feature Map Transfer: Transferring intermediate representations from the teacher to the student to guide learning.
- Hint Training: Using intermediate layers of the teacher model to provide additional guidance to the student model during training.
- While distillation enhances model efficiency, it has limitations:
- Dependent on the original teacher model
- May not fully replicate the capabilities of larger models
- The demand for GPUs is expected to rise as companies push for more intelligent AI.