[News] Decoding DeepSeek V4: How Huawei’s Ascend 950 PR Is Powering China’s Push to Break CUDA Dependence
As export curbs continue to hinder NVIDIA’s H200 from gaining traction in China, domestic chip leader Huawei is emerging as a key beneficiary. According to The Information, cited by 163.com, DeepSeek plans to deploy its upcoming V4 model on Huawei’s latest AI processor, the Ascend 950 PR. Here’s what we know so far about DeepSeek’s roadmap—and how Huawei is positioning its AI stack to capitalize.
Huawei AI Chip Orders Reportedly Surge
Reuters, citing The Information, notes that in anticipation of the V4 launch, major Chinese tech firms—including Alibaba, ByteDance, and Tencent—have reportedly placed large-scale orders for Huawei’s next-generation chips, totaling hundreds of thousands of units. EE Times China adds that the surge in demand has even driven Huawei’s chip prices up by around 20%.
Launch expectations have also shifted into April, with The Information pointing to a mid-April window, closely aligned with the rollout of the Ascend 950 PR.
Notably, The Information, cited by EE Times China, reports that DeepSeek’s V4 model could have been released earlier, but its development was delayed over the past several months as the team worked closely with local AI chip champions Huawei and Cambricon to make extensive adjustments and rewrites to the model’s underlying architecture.
DeepSeek V4 Specs Unveiled
EE Times China suggests that the V4 model is understood to adopt a Mixture-of-Experts (MoE) architecture, with a total parameter count of up to 1 trillion, while activating around 37 billion parameters per inference. In terms of scale, the full V4 version represents more than a twofold increase compared with V3, the report adds.
On the hardware side, Huawei is reportedly providing multiple chip solutions. EE Times China reports that the core compute backbone is the Ascend 910C, manufactured using SMIC’s 7nm process and based on Huawei’s in-house Da Vinci architecture.
On the other hand, the report suggests for higher-end deployment scenarios, Huawei also introduced on March 21 the Atlas 350 accelerator card powered by the Ascend 950PR processor, which delivers up to 1 PFLOPS (FP8) / 2 PFLOPS (FP4) compute performance, with interconnect bandwidth reaching 2 TB/s.
As noted by EE Times China, the Ascend 950 PR also integrates Huawei’s in-house HiBL memory (112GB, 1.4 TB/s), reducing reliance on external supply chains. Notably, according to eastmoney.com, Ascend 950 PR is manufactured by SMIC using its N+3 process, which delivers performance broadly comparable to a 5nm-class process.
Bigger Plans
Meanwhile, Huawei is ramping up production to meet surging domestic demand. EE Times China reports the company plans to produce ~600,000 Ascend 910C chips in 2026—doubling 2025 output—and lift total Ascend capacity to 1.6 million units. The Ascend 950PR has already launched in Q1 2026, with next-gen 960 and 970 chips in the pipeline, each targeting roughly 2x performance gains, the report notes.
163.com, citing Weijin Research, adds that currently, China’s AI training and inference chips—represented by Huawei’s Ascend 950 PR—are broadly considered to sit between NVIDIA’s H100 and H200 in capability, with production capacity remaining the main bottleneck.
According to the report, the 950 PR is still primarily targeted at inference workloads, while the upcoming 950 DT, expected by the end of this year, is designed for training and deep learning scenarios.
If DeepSeek succeeds in running both inference and training on Ascend chips within the next one to two years—and stabilizes the full software stack including compilers, operators, communication libraries, distributed training, and inference frameworks—then its core model development pipeline could effectively become independent of CUDA, the report suggests.
Read more
(Photo credit: Huawei)