TrendForce News operates independently from our research team, curating key semiconductor and tech updates to support timely, informed decisions.
According to Economic Daily News, citing Chinese media outlet STAR Market Daily, Huawei is reportedly set to unveil a breakthrough in AI inference technology on the 12th at the “2025 Financial AI Inference Application Implementation and Development Forum.” The report says the achievement could potentially reduce China’s reliance on HBM technology for AI inference and improve the performance of large AI models in inference tasks.
As noted by ChinaFund, citing sources, the AI industry has shifted from pushing the limits of model capabilities to maximizing application value, with inference emerging as the next focal point of development. While HBM plays a critical role in overcoming data transfer bottlenecks, the report adds that limited HBM capacity can significantly impair AI inference performance, leading to task stalls, slow responses, and other issues.
In early December 2024, the U.S. imposed restrictions on exporting advanced HBM to China. As highlighted by Economic Daily News, SK hynix, Micron, and Samsung — the three main HBM suppliers — are barred from shipping HBM2 or more advanced HBM chips to China, creating a hurdle for Huawei’s development of advanced AI technologies.
In response to these restrictions, China is reportedly seeking U.S. concessions on export controls for HBM chips essential to AI development as part of a trade deal ahead of a potential summit between the two nations, according to Reuters, citing a Financial Times report on the 10th.
Even so, with access to HBM constrained, the Chinese tech giant has been making strides in AI inference. In March 2025, Peking University, in collaboration with Huawei, introduced the DeepSeek full-stack open-source inference solution. ITHome notes that it is built on Peking University’s self-developed SCOW (Super Computing On Web) platform and CraneSched, a task-scheduling system for supercomputing clusters, and incorporates various open-source technologies to enable efficient DeepSeek inference on Huawei’s Ascend platform.
Huawei’s CloudMatrix 384 Pushes AI Compute Limits
Huawei is also advancing with its CloudMatrix 384 system. As highlighted by Tom’s Hardware, this rack-scale AI system integrates 384 Ascend 910C processors. While a single Ascend 910C delivers only about one-third the performance of NVIDIA’s Blackwell, Huawei compensates by using far more chips per system, enabling the CloudMatrix 384 to reach about 300 PFLOPs of dense BF16 compute—nearly double the 180 PFLOPs of NVIDIA’s GB200 NVL72. The report also notes that the system offers 2.1 times the total memory bandwidth and over 3.6 times the HBM capacity, using HBM2E memory.
Read more
(Photo credit: Huawei)