[News] Huawei Debuts Atlas 350 on Ascend 950PR with In-house HBM, Touting 2.8X H20 Performance
After outlining a three-year AI chip roadmap last September, Huawei has now rolled out its first product under the plan. According to South China Morning Post and IT Home, the company’s Ascend 950PR—designed for core AI inference workloads such as prefill and recommendation—made its debut alongside the Atlas 350 accelerator card at the Huawei China Partner Conference 2026 on March 20.
Notably, the chip touts stronger AI compute capabilities and performance surpassing that of U.S. chip giant NVIDIA’s H20. South China Morning Post adds that Atlas 350 delivers 1.56 petaflops of FP4 compute performance—roughly 2.8 times that of NVIDIA’s China-focused H20—according to Zhang Dixuan, head of Huawei’s Ascend computing business.
FP4, the report explains, is a low-precision format, enabling faster data throughput and more efficient processing on the Atlas 350.
For more detailed specs of Ascend 950PR, Mydrivers reports that compared with the previous generation of Ascend chips, the 950PR achieves major gains in low-precision data formats, vector compute, interconnect bandwidth, and its self-developed HBM. As highlighted by the report, compared with NVIDIA’s H20, Ascend 950PR’s HBM capacity is 1.16 times larger, reaching 112GB, while multimodal generation speed can improve by up to 60%.
For the Atlas 350 accelerator card built on the Ascend 950PR, Mydrivers further reports a memory bandwidth of 1.4 TB/s and a power consumption of 600W—roughly 1.5 times that of the H20.
Strategic Leap with Ascend 950PR
The debut of Ascend 950PR marks the beginning of Huawei’s AI chip roadmap through 2028 announced back at Huawei Connect 2025 conference. As reported by Mydrvers, the rollout will start with the Ascend 950PR in Q1 2026 and the 950DT in Q4 2026, followed by the Ascend 960 in Q4 2027 and the Ascend 970 in Q4 2028.
Notably, an analysis from 163.com highlights that the Ascend 950PR launch marks a major milestone for China’s AI chip industry. While NVIDIA’s H20—a compliant version for the Chinese market—is widely used for training and inference, the Ascend 950PR reportedly delivers nearly three times its single-card compute power, signaling a key performance breakthrough for domestic chips.
Second, 163.com notes that FP4 trades precision for efficiency, letting large AI models run on far less memory. For example, a 70-billion-parameter model that normally requires 140 GB of VRAM can run smoothly with just 35 GB using FP4, which means that under the same hardware conditions, larger models can be deployed, or more concurrent inference requests can be supported, the report adds.
Meanwhile, the report points out the Ascend 950PR is the first to feature Huawei’s self-developed HBM high-bandwidth memory (HiBL 1.0), boosting interconnect bandwidth 2.5 times over the previous generation. This gives Huawei full control over its most critical memory components—a strategic advantage in a market where global HBM capacity is dominated by South Korean and U.S. memory giants.
Read more
- [News] Huawei Reportedly Plans 2026 Ascend 950 Sales in South Korea, Targeting Cluster-Level Deployments
- [News] Huawei Unveils Ascend 950 with In-House HBM in 2026, Touts SuperPoD to Rival NVIDIA
(Photo credit: Huawei)