China’s Breakthrough in AI: The Rise of Deep Seek

Sub : Sci

Sec: Awareness in IT

Why in News

China has made significant advancements in Artificial Intelligence (AI) with the launch of DeepSeek, a series of Large Language Models (LLMs) developed by High-Flyer, a Chinese hedge fund.

About DeepSeek:

In April 2023, High-Flyer established its AI lab, DeepSeek, to develop LLMs.
In December 2023, DeepSeek-v3 launched, surpassing its predecessor and outperforming existing Chinese AI models at a lower cost.
Subsequently, DeepSeek-R1 was introduced, matching OpenAI’s O1 model on multiple metrics while being significantly cost-efficient.
DeepSeek revolutionized AI with its Mixture-of-Experts (MoE) model, DeepSeek-v3.
Key Features of DeepSeek-v3:
- Pre-trained on 14.8 trillion tokens
- 671 billion parameters, with 37 billion activated per token
- Trained using Nvidia’s H800 GPUs, requiring 2.78 million GPU hours
- Outperformed Meta’s Llama 3.1, which required 30.8 million more GPU hours

Key Advantages Over OpenAI’s O1:
- 93% cost reduction per API call
- Capable of running on high-end local computers, reducing dependence on cloud services
- Optimized memory efficiency, requiring minimal RAM
- Uses batch processing and cloud computing for cost efficiency
These features make DeepSeek-R1 an attractive option for businesses and developers looking for affordable AI solutions.
DeepSeek-R1 excels in data analysis but remains restricted by China’s censorship policies.
Despite censorship, DeepSeek-R1 became the most downloaded app in the U.S. and ranks third in India’s productivity category.

China’s Previous AI Efforts:

In March 2023, Baidu launched Ernie Bot, marketed as China’s answer to ChatGPT, garnering 30 million user sign-ups in a single day. However, Ernie Bot faced criticism for dodging politically sensitive questions about:
- President Xi Jinping
- Tiananmen Square crackdown
- Human rights issues concerning Uyghur Muslims
Ernie’s failure highlighted the challenges of AI censorship in China and led to scepticism about the country’s ability to develop competitive AI models.

About Distillation Process:

Knowledge distillation is a machine learning technique that involves transferring knowledge from a large, complex model (referred to as the “teacher”) to a smaller, more efficient model (known as the “student”).
Reduces the size of the model, making it more suitable for deployment on devices with limited computational resources.
Enhances inference speed and reduces latency, which is crucial for real-time applications.
Decreases the computational power and memory required for model training and deployment.
Techniques in Knowledge Distillation:
- Logit Matching: Aligning the output logits (pre-activation values) of the student model with those of the teacher model.
- Feature Map Transfer: Transferring intermediate representations from the teacher to the student to guide learning.
- Hint Training: Using intermediate layers of the teacher model to provide additional guidance to the student model during training.
While distillation enhances model efficiency, it has limitations:
- Dependent on the original teacher model
- May not fully replicate the capabilities of larger models
The demand for GPUs is expected to rise as companies push for more intelligent AI.